Run Qwen3.5 0.8B in browser (Web and Extension)

Name: Run Qwen3.5 0.8B in browser (Web and Extension)
Availability: InStock
Author: tantara

by tantara·Mar 7, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerZero to OneWizardryShip It

Multimodal LLM inference in browser with no server, but WebGPU adoption is still narrow.

Strengths

•True end-to-end local inference: no telemetry, no API calls, works offline after model cache.
•Multimodal vision support (image understanding, drag-and-drop, multi-image) in a 0.8B model.
•Both web app and Chrome extension; caching strategy means fast subsequent loads across sessions.

Weaknesses

•WebGPU hardware support is fragmented; falls back to slower CPU on many machines.
•40 tok/s is slow for real work; users expecting ChatGPT-speed will bounce.

Post Description

A year ago, I shared a simple demo of running LLMs locally in a Chrome extension. Today, I’m excited to share TinyWhale, a monorepo that lets you run Qwen 3.5 0.8B entirely on-device, both as a web application and a Chrome extension. I plan to support mobile and desktop apps in the same repo.