Back to browse
GitHub Repository

AI-generated x86-64 assembly vs GCC -O3 on production kernels. 4.8-6.3x on base64, verified with 300K fuzz iterations.

2 starsPython

AI-optimized x86-64 assembly vs. GCC -O3 on three production kernels

by cod-e·Feb 15, 2026·1 point·1 comment

AI Analysis

●●SolidWizardryBig Brain

PSHUFB nibble trick beats GCC's lookup table by 4.8–6.3x on base64; reproducible fuzz methodology.

Strengths
  • Differential fuzzing (300K iterations, zero mismatches) validates correctness rigorously—not hand-waved.
  • Real-world kernels (base64, LZ4, SipHash) from production codebases, not toy examples.
  • SSSE3 pshufb insight (gathering via shuffle instead of table) is a genuine algorithmic win.
Weaknesses
  • Niche audience: only systems programmers optimizing hot paths benefit from hand-rolled asm.
  • No tool/framework for reproducible AI-asm generation; results are a blog post, not a usable product.
Target Audience

Low-level systems programmers, compiler engineers, AI researchers

Similar To

Superoptimizer (classical) · STOKE (superoptimization tool)

Post Description

Show HN: AI-generated assembly vs GCC -O3 on real codebases (300K fuzz, 0 failures) Three kernels extracted from real open source projects, optimized with AI-generated x86-64 assembly, verified with 100K differential fuzz each: KernelAI strategySpeedupVerdictBase64 decodeSSSE3 pshufb table-free lookup4.8–6.3xAI winsLZ4 fast decodeSSE 16-byte match copy~1.05xAI wins (marginal)Redis SipHashReordered SIPROUND scheduling0.97xGCC wins The base64 win: GCC can't auto-vectorize a 256-byte lookup table (it's a gather pattern). The AI replaces it with a pshufb nibble trick — 16 parallel lookups in one instruction, zero table accesses. 1.8 GB/s → 11.6 GB/s. The SipHash loss: on pure ALU kernels (adds, rotates, XORs), GCC's scheduler is already near-optimal. 300K total fuzz iterations, zero mismatches. Every result is one command to reproduce.

Similar Projects

AI/ML●●●Banger

Auto GPU Kernel – Autonomous GPU-kernel discovery and optimizer

Autonomous kernel optimizer that won MLSys contest with 34.93x speedup.

WizardryBig BrainBold Bet
dogacel
1010d ago