Gemma 3 inference in pure C++ with Metal acceleration

Name: Gemma 3 inference in pure C++ with Metal acceleration
Availability: InStock
Author: ybubnov

by ybubnov·Jul 4, 2026·2 points·1 comment

Similar Projects

AI/ML●●●Banger

30x faster cold start than vLLM with zero PyTorch dependencies.

WizardryBig BrainZero to One

zyoraclub

201mo ago

AI/ML●●●Banger

GPT-2 inference in pure C# allocating zero bytes per token beats ONNX Runtime.

WizardryBig Brain

dev-on-bike

111mo ago

Metal rendering is nice, but Neovide already does GPU acceleration cross-platform.

CozyShip It

rainux

703mo ago

AI/ML●●●Banger

Tile IR pipeline compiles Python kernels to Metal with automatic operator fusion for M1+.

WizardryBig BrainNiche Gem

rayanht

2013d ago

AI/ML●●●Banger

Pure Go LLM inference, zero dependencies, 48 tok/s—genuinely novel for Go ecosystem.

Zero to OneWizardryBig Brain

computerex

203mo ago

AI/ML●●●Banger

Fused int4 attention kernel on Metal keeps LLM speed constant as context grows.

WizardrySolve My ProblemBig Brain

christinetyip

102mo ago