Sun – Realtime voice agent for group conservation not just turn taking
Multi-speaker voice model with natural interruption and barge-in prevention, genuinely different from turn-taking chatbots.
The addressee layer for voice agents: only speech meant for your agent reaches your STT, LLM, or TTS, on any stack.
Solves the multi-human voice AI problem without wake words using attention pattern detection.
Voice AI developers building multi-human or multi-agent conversational systems
Picovoice · Porcupine · Rhino
This is an SDK you can put before your STT. It lets you know when your device is being spoken to or not without a wakeword. You can use it for: -Single AI, Multi human -Multi AI, Single human -Multi AI, Multi human (we recommend also adding a wakeword on top for a better system)
There are two models. One that is video + audio and one that is just audio. The way it overall works is that it looks for shifts in attention patterns (body language changes, vocal patterns) to work. It's a tough problem to nail as every human being is different in how they interact with people/devices.
Let me know how it is!
Multi-speaker voice model with natural interruption and barge-in prevention, genuinely different from turn-taking chatbots.
Wakeword queries mid-conversation when Otter only handles scheduled calls.
Full voice agent (STT→LLM→TTS) runs locally on GPU, no backend needed.
Yet another AI text-to-speech wrapper; Eleven Labs, Google Cloud TTS exist.
Replicates Thinking Machines' multimodal demo on a CPU laptop with commodity models.
Another voice cloning platform when ElevenLabs already dominates this space.