Stop Babysitting Your Agents
Cross-agent hook system with click-to-approve on macOS, but Linux gets audio only.
Spec-driven multi-agent orchestration — autonomous development workforce powered by Claude & OpenHands
DAG-based agent swarms with spec generation from codebase beat prompt chaining, but long-term reliability unproven.
Engineering teams wanting autonomous feature implementation with verification; agents as a productivity layer
Cursor · Continue.dev · Codeium
Attempt 1 - Claude/GPT directly: works for small stuff, but you re-explain context endlessly.
Attempt 2 - Copilot/Cursor: great autocomplete, still doing 95% of the thinking.
Attempt 3 - continuous agents: keeps working without prompting, but "no errors" doesn't mean "feature works."
Attempt 4 - parallel agents: faster wall-clock, but now you're manually reviewing even more output.
The common failure: nobody verifies whether the output satisfies the goal. That somebody was always me. So I automated that job.
OmoiOS is a spec-driven orchestration system. You describe a feature, and it:
1. Runs a multi-phase spec pipeline (Explore > Requirements > Design > Tasks) with LLM evaluators scoring each phase. Retry on failure, advance on pass. By the time agents code, requirements have machine-checkable acceptance criteria.
2. Spawns isolated cloud sandboxes per task. Your local env is untouched. Agents get ephemeral containers with full git access.
3. Validates continuously - a separate validator agent checks each task against acceptance criteria. Failures feed back for retry. No human in the loop between steps.
4. Discovers new work - validation can spawn new tasks when agents find missing edge cases. The task graph grows as agents learn.
What's hard (honest):
- Spec quality is the bottleneck. Vague spec = agents spinning. - Validation is domain-specific. API correctness is easy. UI quality is not. - Discovery branching can grow the task graph unexpectedly. - Sandbox overhead adds latency per task. Worth it, but a tradeoff. - Merging parallel branches with real conflicts is the hardest problem. - Guardian monitoring (per-agent trajectory analysis) has rough edges still.
Stack: Python/FastAPI, PostgreSQL+pgvector, Redis (~190K lines). Next.js 15 + React Flow (~83K lines TS). Claude Agent SDK + Daytona Cloud. 686 commits since Nov 2025, built solo. Apache 2.0.
I keep coming back to the same problem: structured spec generation that produces genuinely machine-checkable acceptance criteria. Has anyone found an approach that works for non-trivial features, or is this just fundamentally hard?
GitHub: https://github.com/kivo360/OmoiOS Live: https://omoios.dev
Cross-agent hook system with click-to-approve on macOS, but Linux gets audio only.
Custom agent framework in 2300 lines beats 400K-line bloatware; auditable and runs fully local.
Solves a genuine frustration, but it's a one-liner problem—call alert() on subprocess exit.
Replaces agent orchestration with deterministic code, but 'multi-agent dev team' space is crowded.
LangChain alternative with 2 dependencies and async-native architecture from the start.
Deterministic policy gates beat LLM guardrails when your agent tries to DROP TABLE.