LLM inference slowdown fixed (177 experiments, +37% attention) – in 48h
Fused int4 attention kernel on Metal keeps LLM speed constant as context grows.
A Python library that solves context window degradation in long-running LLM agents by moving memory management out of the model layer and into a deterministic engine.
Engine-managed memory beats model self-summarization on OOLONG benchmarks.
Developers building long-horizon LLM agent systems
LangChain memory · LlamaIndex context managers · Mem0
Fused int4 attention kernel on Metal keeps LLM speed constant as context grows.
40-line baton handoffs keep AI agents at peak performance across long tasks.
Relay architecture beats context rot, but depends entirely on Claude Code CLI for now.
Number theory instead of learned routers, but conjecture-level math and limited production evidence.
Git metaphor for agent memory is clever; execution and adoption remain unproven.
Agents cross-review each other's work to stop context drift before it hits the wiki.