GitHub Repository

Verified AI infrastructure for regulated deployment. UltraCompress (our wedge): near-lossless 5-bit compression with SHA-256-reproducible reconstruction - prove the model in production is the one you validated. 23 architectures (0.6B-405B), Hermes-3-405B @ 1.0066x. OpenAI-compatible API. pip install ultracompress

13 starsPython

UltraCompress – first mathematically lossless 5-bit LLM compression

Name: UltraCompress – first mathematically lossless 5-bit LLM compression
Availability: InStock
Author: mounnar

by mounnar·May 8, 2026·6 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardryBig Brain

Runs 405B model compression on a single 32GB GPU when others need enterprise clusters.

Strengths

•Streaming architecture processes layers sequentially to bypass VRAM capacity limits.
•Achieves 1.0066 mean perplexity ratio across dense and MoE architectures up to 405B.
•Per-layer low-rank correction training recovers accuracy lost during 5-bit quantization.

Weaknesses

•Requires CUDA hardware with at least 16GB VRAM, excluding older consumer cards.
•Compression process still takes significant time despite streaming optimizations.