We built an LLM inference engine in pure Python – no PyTorch, no Triton
30x faster cold start than vLLM with zero PyTorch dependencies.
Pure C++23 LLM inference for Apple Silicon chips
30x faster cold start than vLLM with zero PyTorch dependencies.
GPT-2 inference in pure C# allocating zero bytes per token beats ONNX Runtime.
Metal rendering is nice, but Neovide already does GPU acceleration cross-platform.
Tile IR pipeline compiles Python kernels to Metal with automatic operator fusion for M1+.
Pure Go LLM inference, zero dependencies, 48 tok/s—genuinely novel for Go ecosystem.
Fused int4 attention kernel on Metal keeps LLM speed constant as context grows.