Anchor Engine – Deterministic Semantic Memory for LLMs local (<3GB RAM)
Compresses 28M tokens to 100k queryable chars local-only; duplicates RAG problems at smaller scale.
Graph-based context compression beats lossy summarization when tokens run out.
LLM application developers building long-running agents or chat systems
Mem0 · LangChain Memory · Zep
Breathe-memory takes a different approach: associative injection. Before each LLM call, it extracts anchors from the user's message (entities, temporal references, emotional signals), traverses a concept graph via BFS, runs optional vector search, and injects only what's relevant — typically in <60ms.
When context fills up, instead of summarizing, it extracts a structured graph: topics, decisions, open questions, artifacts. This preserves the semantic structure that summaries destroy.
The whole thing is ~1500 lines of Python, interface-based, zero mandatory deps. Plug in any database, any LLM, any vector store. Reference implementation uses PostgreSQL + pgvector.
https://github.com/tkenaz/breathe-memory
We've been running this in production for several months. Open-sourcing because we think the approach (injection over retrieval) is underexplored and worth more attention.
We've also posted an article about memory injections in a more human-readable form, if you want to see the thinking under the hood: https://medium.com/towards-artificial-intelligence/beyond-ra...
Compresses 28M tokens to 100k queryable chars local-only; duplicates RAG problems at smaller scale.
Deterministic graphs instead of vector embeddings sound clever, but long-context windows and RAG tools already solve this problem cheaper.
Local RAG for browser LLMs with decay lifecycle, but already competed by Langchain vectors.
Research framework with published paper, not a production red-teaming tool.
LLM-controlled memory dumper for game reversing—Claude as a Cheat Engine. Genuinely inventive pairing.
Direct weight editing for persistent memory—MEMIT meets LoRA consolidation with null-space math.