Claude Code skills for building LLM evals
Structured eval workflow for Claude Code when LangSmith and Braintrust already exist.

Just a Substack essay about LLMs, not a tool or code project.
Developers navigating AI tools
Medium · Substack · Dev.to
Structured eval workflow for Claude Code when LangSmith and Braintrust already exist.
Isolated subagent contexts mean each AI opponent truly can't see other players' cards.
The CPU/kernel/process analogy is more than marketing — the project actually spawns short‑lived Sub‑Agents to limit context pollution and pairs that with a GitHub 'App Store' for one‑click skill installs. It's a practical, Windows‑centric proof‑of‑concept with sandboxing and security scans, but it isn't a clear leap beyond existing agent frameworks (LangChain/AutoGPT) and the repo looks early-stage with a small community.
TLA+ code generation for agents, but audience is tiny—only useful if your agent needs formal verification.
Finally, Rails conventions for LLM calls instead of scattered API code in controllers.
4x token savings on screenshots with readable text at 800px grey.