Calx – track and compile corrections humans make with AI agents

Name: Calx – track and compile corrections humans make with AI agents
Availability: InStock
Author: spenceships

by spenceships·Apr 1, 2026·3 points·1 comment

Visit Project View on HN

AI Analysis

●MidBig Brain

Rules don't transfer across contexts—executable mechanisms do, per field study.

Strengths

•Real production data from 82k lines of code across 6 AI agents in 20 days
•Published correction logs with full evidence on GitHub
•Aligns with Meta's HyperAgents research on embodied mechanisms

Weaknesses

•No actual tool or product—just research observations and logs
•GitHub org has 2 repos with minimal activity and no public members

Post Description

Last year I got laid off and started building a company. Fast forward to a month ago, I built a production system with 6 AI agents across 82,000 lines of code in 20 days for $250. I kept obsessive correction logs. Every time an agent made a mistake and I told it what to do differently, and I made sure it logged the correction itself.

When I transferred 237 of those corrections as rules to a new agent to save time with onboarding in a new repo, it made 44 new mistakes. 13 were in categories the rules explicitly covered. The rules were present in context. The behavior wasn't there. I published the field study with full correction logs.

Then Meta's Superintelligence Labs published HyperAgents (arXiv:2603.19461, March 2026). They found the complementary result: improvements DO transfer across domains when embodied in executable mechanisms (persistent memory, performance tracking, eval loops), not when written as rule text. Two independent studies, same boundary: documentation is not behavior.

So I built Calx. pip install getcalx gives you a CLI + MCP server that:

Captures corrections developers make to AI agents Detects recurrence via keyword similarity (Jaccard), auto-promotes at 3x threshold Promotes recurring corrections to enforced rules and hooks, injected at session start Scopes rules per domain/directory so each agent gets only what's relevant

It runs as a FastMCP server over Streamable HTTP (SQLite locally) so any MCP-compatible client connects: Claude Code, Claude Desktop, Cursor, custom agents. It is primarily designed for Claude Code. It also handles token discipline (prevents context compaction from destroying correction signal), multi-agent orchestration, session lifecycle hooks, orientation gates, and dirty-exit recovery.

The difference from agent memory tools: existing agent memory systems store information for retrieval. Calx tracks the behavioral plane, how an agent works with a specific person, not just what it knows. The data shows the information plane alone doesn't reliably change behavior.

v0.5.0, 443 tests, MIT license. Paper with full evidence: https://doi.org/10.5281/zenodo.19159223