Back to browse
Run Qwen3.5 0.8B in browser (Web and Extension)

Run Qwen3.5 0.8B in browser (Web and Extension)

by tantara·Mar 7, 2026·1 point·0 comments

AI Analysis

●●●BangerZero to OneWizardryShip It

Multimodal LLM inference in browser with no server, but WebGPU adoption is still narrow.

Strengths
  • True end-to-end local inference: no telemetry, no API calls, works offline after model cache.
  • Multimodal vision support (image understanding, drag-and-drop, multi-image) in a 0.8B model.
  • Both web app and Chrome extension; caching strategy means fast subsequent loads across sessions.
Weaknesses
  • WebGPU hardware support is fragmented; falls back to slower CPU on many machines.
  • 40 tok/s is slow for real work; users expecting ChatGPT-speed will bounce.
Category
Target Audience

Privacy-conscious users, developers testing local inference, Chrome extension users avoiding cloud LLMs

Similar To

Ollama · LocalAI · Hugging Face Transformers.js

Post Description

A year ago, I shared a simple demo of running LLMs locally in a Chrome extension. Today, I’m excited to share TinyWhale, a monorepo that lets you run Qwen 3.5 0.8B entirely on-device, both as a web application and a Chrome extension. I plan to support mobile and desktop apps in the same repo.

Similar Projects

AI/MLMid

Qwen Lens Studio – multimodal app on Qwen3.6-35B-A3B, runs on Ollama

Yet another multimodal wrapper when Cursor and Continue already dominate this space.

Ship It
vijgaurav
301mo ago
AI/ML●●●Banger

Qwen 3.5 running on a $300 Android phone – on-device, open source

Comprehensive offline AI suite (text, vision, images) on $300 phones—genuinely complete.

Zero to OneWizardry
ali_chherawalla
6103mo ago