Fastest(?) SIMD CSV Parser in Rust
Beats simd-csv with pclmulqdq trick, but CSV parsing is a solved category.
AVX-512 fused NF4 dequantization + matrix multiplication for local LLM inference.
Genetic algorithm evolves x86 kernels; runs 80B MoE on single GPU with CPU offload.
Machine learning engineers optimizing local LLM inference on x86 CPUs; MoE model researchers
bitsandbytes · GPTQ · AWQ
Beats simd-csv with pclmulqdq trick, but CSV parsing is a solved category.
INT4 inference engine beats llama.cpp on VRAM, but competing against established tools.
Explicit kernel control over TVM-style black boxes, but benchmarks show mixed wins vs Transformers.js.
Fused int4 attention kernel on Metal keeps LLM speed constant as context grows.
30x faster cold start than vLLM with zero PyTorch dependencies.
E8 lattice codebooks beat GPTQ at 2-4 bpw with fused CUDA kernel skipping weight materialization.