I built a React SDK to control apps with voice, gaze and gestures
React SDK fusing voice, gaze, and gestures into one interaction layer.

Runs everything in the browser and actually stays responsive — ONNX Runtime Web + a YOLOX model handle subtle hand-seal distinctions that MediaPipe struggled with. Clever choice to layer a T9 keypad over gesture input (reduces required class count and makes errors tolerable), but the demo remains an experiment: lighting sensitivity and similar seals create real UX friction and it’s not yet a drop-in input alternative.
Web developers, ML/HCI hobbyists and researchers, accessibility/interaction designers curious about in-browser gesture interfaces
It uses:
YOLOX for gesture detection
ONNX Runtime Web for in-browser inference
Plain JS for the UI
The original goal was simple: Could I make real-time gesture-based input usable inside a browser without freezing the UI?
A few observations:
In-browser ML performance is better than I expected on modern laptops
Subtle gesture distinctions (e.g. similar seals like Tiger vs Ram) require stronger detection than MediaPipe provided — YOLOX performed noticeably better
Lighting consistency matters more than hand size
It’s obviously not production-grade, but it was an interesting exploration of browser-based vision input.
Curious what others think about gesture interfaces as alternative input systems.
React SDK fusing voice, gaze, and gestures into one interaction layer.
Gesture-controlled try-on wrapper around Decart's Lucy VTON when Shopify already does this.
Runs Apple's 2.4GB SHARP model entirely in-browser using ONNX Runtime Web.
Open-source mouseless alternative with macro recording and resolution-independent replays.
Yet another Android keyboard skin with paid themes in a saturated market.
Home button toggles controller between gamepad and trackpad mode instantly.