Back to browse
GitHub Repository

DocMason is a repo-native agent that turns your complex office files into a local LLM knowledge base and your second brain. The repo is the app. Codex is the runtime.

127 starsPython

DocMason – Agent Knowledge Base for local complex office files

by Jet_Xu·Apr 4, 2026·11 points·0 comments

AI Analysis

●●SolidBig BrainNiche Gem

Provenance-first RAG beats anonymous text chunks, but Cursor and Continue already own this space.

Strengths
  • Multimodal extraction preserves slide layouts and spreadsheet structure instead of flattening to text
  • Strict traceability means every answer links back to source evidence bundles
  • Repo-native paradigm with zero cloud ingestion keeps everything local
Weaknesses
  • macOS-only limits adoption, no Linux or Windows support mentioned
  • Depends on OpenAI Codex subscription—hardly open or accessible
Category
Target Audience

IT architects, analysts working with complex internal documents

Similar To

Cursor · Continue · Sourcegraph Cody

Post Description

I think everyone has already read Karpathy's Post about LLM Knowledge Bases. Actually for recent weeks I am already working on agent-native knowledge base for complex research (DocMason). And it is purely running in Codex/Claude Code. I call this paradigm is: The repo is the app. Codex is the runtime.

During my daily working life, I have tons of office documents with knowledge from all teams, and as an IT Architect, I need to combine them altogether to handle complex deep research (which normal LLM definitely could not help). That is the originally reason I built DocMason, and I am using it in everyday which support me on lots of complex topics.

I have already open-sourced this repo. And I think it takes Karpathy's concept a step further for real-world usage in three ways: 1. It could handle most kinds of office docs (pptx, docx, excels, even .eml). And really extract multimodal information from all IT architecture diagram or excel sheets. 2. It is running as a Real APP but not a naive RAG tool. DocMason could run smoothly and intelligently to prepare environment, auto update, and auto incrementally sync Knowledge base. 3. Most importantly it is running in Native AI Agents, which could leverage powerful AI Agents engine (e.g. Codex or Claude Code)

View detail architecture diagram in DocMason Readme, and then download have a try :) You will find it could help a lot during daily work. Would love to hear your feedback and issues in Github!

Similar Projects

AI/ML●●Solid

DocMason – AI Agent Knowledge Base for local complex office files

Preserves document structure instead of flattening to text like most RAG tools.

Solve My ProblemBold Bet
Jet_Xu
232mo ago
AI/ML●●Solid

Kilroy – Knowledge base for teams using Claude Code

Agents leaving notes for other agents via MCP is a clever pattern for tribal knowledge.

Niche GemBig Brain
t55
501mo ago