Back to browse
Calx – track and compile corrections humans make with AI agents

Calx – track and compile corrections humans make with AI agents

by spenceships·Apr 1, 2026·3 points·1 comment

AI Analysis

MidBig Brain

Rules don't transfer across contexts—executable mechanisms do, per field study.

Strengths
  • Real production data from 82k lines of code across 6 AI agents in 20 days
  • Published correction logs with full evidence on GitHub
  • Aligns with Meta's HyperAgents research on embodied mechanisms
Weaknesses
  • No actual tool or product—just research observations and logs
  • GitHub org has 2 repos with minimal activity and no public members
Category
Target Audience

AI agent developers and researchers

Similar To

LangChain memory systems · Agent correction tracking tools

Post Description

Last year I got laid off and started building a company. Fast forward to a month ago, I built a production system with 6 AI agents across 82,000 lines of code in 20 days for $250. I kept obsessive correction logs. Every time an agent made a mistake and I told it what to do differently, and I made sure it logged the correction itself.

When I transferred 237 of those corrections as rules to a new agent to save time with onboarding in a new repo, it made 44 new mistakes. 13 were in categories the rules explicitly covered. The rules were present in context. The behavior wasn't there. I published the field study with full correction logs.

Then Meta's Superintelligence Labs published HyperAgents (arXiv:2603.19461, March 2026). They found the complementary result: improvements DO transfer across domains when embodied in executable mechanisms (persistent memory, performance tracking, eval loops), not when written as rule text. Two independent studies, same boundary: documentation is not behavior.

So I built Calx. pip install getcalx gives you a CLI + MCP server that:

Captures corrections developers make to AI agents Detects recurrence via keyword similarity (Jaccard), auto-promotes at 3x threshold Promotes recurring corrections to enforced rules and hooks, injected at session start Scopes rules per domain/directory so each agent gets only what's relevant

It runs as a FastMCP server over Streamable HTTP (SQLite locally) so any MCP-compatible client connects: Claude Code, Claude Desktop, Cursor, custom agents. It is primarily designed for Claude Code. It also handles token discipline (prevents context compaction from destroying correction signal), multi-agent orchestration, session lifecycle hooks, orientation gates, and dirty-exit recovery.

The difference from agent memory tools: existing agent memory systems store information for retrieval. Calx tracks the behavioral plane, how an agent works with a specific person, not just what it knows. The data shows the information plane alone doesn't reliably change behavior.

v0.5.0, 443 tests, MIT license. Paper with full evidence: https://doi.org/10.5281/zenodo.19159223

Similar Projects