SRA – A new architectural pattern for modern product engineering
Well-reasoned three-tier architecture, but lacks reference implementations and adoption proof.
AI-generated x86-64 assembly vs GCC -O3 on production kernels. 4.8-6.3x on base64, verified with 300K fuzz iterations.
PSHUFB nibble trick beats GCC's lookup table by 4.8–6.3x on base64; reproducible fuzz methodology.
Low-level systems programmers, compiler engineers, AI researchers
Superoptimizer (classical) · STOKE (superoptimization tool)
Well-reasoned three-tier architecture, but lacks reference implementations and adoption proof.
Hand-tuned SSE particle engine from 2002 assembly, now runs in your browser via WASM.
6KB binary for an AI agent—fits on a floppy disk 62 times over.
Autonomous kernel optimizer that won MLSys contest with 34.93x speedup.
Beats PyTorch eager by 5.29x on RMSNorm using autonomous agent loops.
Genetic algorithm evolves x86 kernels; runs 80B MoE on single GPU with CPU offload.