Autoresearch-WebGPU uses agents to iteratively train LMs in the browser
Runs autoresearch agents entirely in-browser using WebGPU and jax-js.

Runs GGUF models in the browser via custom WGSL shaders when cloud APIs ignore tiny models.
Developers experimenting with edge AI, WebGPU enthusiasts, and privacy-focused users
svenflow/webgpu-gemma · WebLLM · Transformers.js
These are only so useful in a multi-turn conversation but it's still interesting to see what you can pack in a <250mb model.
I tried using ONNX versions earlier, but there were too many quirks of using them with language models and the TPS wasn't too impressive. Inspired by svenflow/webgpu-gemma, I put my codex and claude to the task of writing WGSL to run inference for GGUF versions of these models.
Once you load this website and a model, it should load offline too, until your browser evicts the model from the cache.
Runs autoresearch agents entirely in-browser using WebGPU and jax-js.
WebGPU LLM inference in-browser is slick, but Ollama, LM Studio, and local alternatives already work offline.
Ollama and llama.cpp server already do this with more maturity and model support.
Train a working LLM in 5 minutes on free Colab with a fish personality.
Beats Qwen2.5-VL-7B on temporal grounding while running on a single consumer GPU.
Useful tutorial, but llama.cpp docs and Ollama already cover most of this.