WebGPU LLM inference comprehensive benchmark
Sequential-dispatch methodology corrects 20x overestimation in prior WebGPU benchmarks.

WebGPU LLM inference in-browser is slick, but Ollama, LM Studio, and local alternatives already work offline.
Developers exploring local LLM inference; users seeking private, no-signup chat without server dependency.
Ollama · LM Studio · transformers.js
Sequential-dispatch methodology corrects 20x overestimation in prior WebGPU benchmarks.
Runs GGUF models in the browser via custom WGSL shaders when cloud APIs ignore tiny models.
Runs autoresearch agents entirely in-browser using WebGPU and jax-js.
Explicit kernel control over TVM-style black boxes, but benchmarks show mixed wins vs Transformers.js.
Full voice agent (STT→LLM→TTS) runs locally on GPU, no backend needed.
Music stem separation in the browser without uploads—Rust + WebGPU beats cloud dependency.