Back to browse
GitHub Repository

Verified AI infrastructure for regulated deployment. UltraCompress (our wedge): near-lossless 5-bit compression with SHA-256-reproducible reconstruction - prove the model in production is the one you validated. 23 architectures (0.6B-405B), Hermes-3-405B @ 1.0066x. OpenAI-compatible API. pip install ultracompress

13 starsPython

UltraCompress – first mathematically lossless 5-bit LLM compression

by mounnar·May 8, 2026·6 points·0 comments

AI Analysis

●●●BangerWizardryBig Brain

Runs 405B model compression on a single 32GB GPU when others need enterprise clusters.

Strengths
  • Streaming architecture processes layers sequentially to bypass VRAM capacity limits.
  • Achieves 1.0066 mean perplexity ratio across dense and MoE architectures up to 405B.
  • Per-layer low-rank correction training recovers accuracy lost during 5-bit quantization.
Weaknesses
  • Requires CUDA hardware with at least 16GB VRAM, excluding older consumer cards.
  • Compression process still takes significant time despite streaming optimizations.
Category
Target Audience

ML engineers, researchers running local LLMs

Similar To

bitsandbytes · AWQ · GGUF

Similar Projects