GitHub Repository

Standalone TurboQuant KV Cache Inference for https://huggingface.co/g023/Qwen3-1.77B-g023

4 starsPython

Standalone TurboQuant KV Cache Inference

Name: Standalone TurboQuant KV Cache Inference
Availability: InStock
Author: g023

by g023·Apr 3, 2026·3 points·4 comments

Visit Project View on HN

AI Analysis

●MidBig BrainShip It

Standalone KV cache compression script implementing TurboQuant with 1.55x ratio.

Strengths

•Self-contained implementation of complex quantization math like Lloyd-Max and QJL.
•Minimal dependencies make it easy to audit the actual inference logic.
•Demonstrates specific memory savings on a custom Qwen3 model variant.

Weaknesses

•Manual file management ("throw in folder") creates friction versus pip installable tools.
•1.55x compression ratio trails industry standards like INT4 or FP4 quantization.

Post Description

Implements TurboQuant (ICLR 2026, arXiv:2504.19874) KV cache compression directly inside a Transformers inference script. All algorithms are self-contained. Minimal dependencies.

- uses https://huggingface.co/g023/Qwen3-1.77B-g023 as the demonstration model (throw model files in Qwen3-BEST folder)

Similar Projects

AI/ML●●Solid

TurboQuant for vector search – 2-4 bit compression

Data-oblivious quantization beats Product Quantization on online updates.

Big BrainNiche Gem

justsomeguy1996

8964mo ago

AI/ML●●●Banger

TurboQuant for mlx-lm (Apple Silicon)

Custom Metal kernels bring Google's TurboQuant KV-cache compression to Apple Silicon.

WizardrySolve My Problem

pythongiant

1124d ago

AI/ML●●●Banger

TurboQuant-WASM – Google's vector quantization in the browser

Google's ICLR 2026 quantization paper running client-side with SIMD-accelerated dot products.

WizardryZero to One

teamchong

16573mo ago

Developer Tools●●Solid

simple_ans – Asymmetric Numeral Systems Compression in Python/C++

Single-file C++ ANS kernel beats wrestling with zstandard for quantized data.

Niche GemCozy

jmagland

313mo ago

Open Source●Mid

Algorithms 1.0.0 – Minimal and clean implementations of algorithms

Files are single-purpose and readable: each algorithm comes with docstrings, type hints, complexity notes and runnable examples so you can read, test, or pip-install bits immediately. It isn't breaking new ground — algorithm collections are common — but the focus on clarity, tests, and a tiny surface API (merge_sort, BinaryHeap, dijkstra, etc.) makes this a reliable reference and teaching aid.

Niche GemCrowd Pleaser

kwk236

705mo ago

AI/ML●●Solid

Mamba SSM in Rust – training and inference with custom CUDA kernels

Custom CUDA kernels for SSM recurrence with zero framework dependencies.

WizardryNiche Gem

silvermpx

104mo ago