Back to browse
GitHub Repository

AVX-512 fused NF4 dequantization + matrix multiplication for local LLM inference.

14 starsPython

Evolved x86 AVX-512 kernels for NF4 LLM inference

by Anuar81·Feb 16, 2026·2 points·0 comments

AI Analysis

●●●BangerWizardryBig BrainNiche Gem

Genetic algorithm evolves x86 kernels; runs 80B MoE on single GPU with CPU offload.

Strengths
  • Novel approach: evolutionary x86 kernel discovery, not hand-written—demonstrates genuine optimization insight
  • Extreme performance gain (165x) with rigorous hardware validation on real models (Qwen3-80B)
  • Solves a real constraint: makes quantized MoE inference practical on consumer hardware without CUDA
Weaknesses
  • Tiny audience: requires NF4 quantization + x86 + Zen 4+ CPU + specific inference patterns
  • High barrier to adoption: limited to specific model formats and hardware; no Windows support evident
Target Audience

Machine learning engineers optimizing local LLM inference on x86 CPUs; MoE model researchers

Similar To

bitsandbytes · GPTQ · AWQ

Similar Projects

AI/ML●●●Banger

Glq LLM quantization using E8 lattice

E8 lattice codebooks beat GPTQ at 2-4 bpw with fused CUDA kernel skipping weight materialization.

WizardryBig Brain
acd
2012d ago