Every Claw Deserves a Face
CPU-only face animation at 30 FPS when most agent embodiment tools require cloud GPUs.
Voice+video agent built by orchestrating Deepgram, ElevenLabs, LemonSlice, and LiveKit together.
AI agent developers building conversational interfaces with video components
Talkbase · Vocode · Dasha
Here's a sneak peek from the interview: https://x.com/ptservlor/status/2024597444890128767
User speaks, Deepgram transcribes it, OpenClaw Gateway routes it to your agent, ElevenLabs turns the response into speech, and LemonSlice generates a lip synced avatar from the audio. Everything streams over LiveKit in real time.
Latency is about 1 to 2 seconds end to end depending on the LLM. The lip sync from LemonSlice honestly surprised us, it works way better than we expected.
The skill repo has a complete Python example, env setup, troubleshooting guide, and a Next.js frontend guide if you want to build your own web UI for it.
CPU-only face animation at 30 FPS when most agent embodiment tools require cloud GPUs.
Full agent tool access on every utterance, unlike native realtime plugins.
Unified test harness for voice agents across Retell, VAPI, LiveKit, Bland with LLM scoring.
The repo treats memory and identity as first-class, using SOUL.md/AGENTS.md/MEMORY.md plus per-day markdown logs so an agent can literally "read yesterday" before answering — a clear, human-readable model that avoids opaque vector stores. Useful CLI commands (init, doctor, grow, reflect) show the author thought about ergonomics and maintenance, but integration with LLM runtimes and evaluative evidence for the approach are light, so it's a pragmatic, opinionated toolkit rather than a breakthrough platform.
OpenClaw phone integration that actually works without webhook hell.
Self-modifying skills let agents persist new behaviors without restarts or config edits.