Back to browse
GitHub Repository

On-device, real-time multimodal AI. Have natural voice and vision conversations with an AI that runs entirely on your machine. Powered by Gemma 4 E2B and Kokoro.

1,809 starsHTML

Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B

by karimf·Apr 5, 2026·298 points·38 comments

AI Analysis

●●●BangerWizardryDark Horse

Runs Gemma 4 E2B and Kokoro TTS locally with barge-in and vision.

Strengths
  • On-device processing eliminates server costs and privacy concerns for voice AI.
  • Barge-in and sentence-level streaming create natural conversation flow without pauses.
  • Combines vision and voice in a single local stack for interaction.
Weaknesses
  • Requires M3 Pro or strong GPU, limiting accessibility for older hardware.
  • Research preview means bugs and rough edges are expected in builds.
Category
Target Audience

Developers building voice AI, privacy-focused users, language learners

Similar To

OpenAI Realtime API · LiveKit Agents · Bule AI

Similar Projects

AI/ML●●●Banger

Gemma 4 Multimodal Fine-Tuner for Apple Silicon

Only Apple Silicon toolkit streaming GCS data during audio fine-tuning without OOM.

WizardryNiche GemZero to One
MediaSquirrel
235281mo ago
AI/ML●●Solid

Running Gemma 4 on an iPhone 13 Pro

Clean Swift wrapper for Gemma 4 with vision and audio on iPhone.

Niche GemShip It
dengjiuhong
101mo ago