AI/ML●●Solid
Mamba SSM in Rust – training and inference with custom CUDA kernels
Custom CUDA kernels for SSM recurrence with zero framework dependencies.
WizardryNiche Gem
silvermpx
103mo ago
Custom CUDA kernels for SSM recurrence with zero framework dependencies.
Genetic algorithm evolves x86 kernels; runs 80B MoE on single GPU with CPU offload.
INT4 inference engine beats llama.cpp on VRAM, but competing against established tools.
Explicit kernel control over TVM-style black boxes, but benchmarks show mixed wins vs Transformers.js.
Fused int4 attention kernel on Metal keeps LLM speed constant as context grows.
30x faster cold start than vLLM with zero PyTorch dependencies.