Back to browse
GitHub Repository

Instant code briefing for AI comprehension.

2 starsC

Brf.it – Extracting code interfaces for LLM context

by jeff-lee·Mar 7, 2026·1 point·0 comments

AI Analysis

●●SolidSolve My ProblemBig Brain

Tree-sitter interface extraction cuts token usage by 6x, but chat context window optimization is becoming table stakes.

Strengths
  • Tree-sitter AST parsing avoids regex fragility; accurate across Go, TypeScript, Python, and 15+ languages.
  • Dual XML/Markdown output with automatic token counting lets you measure and optimize LLM context efficiently.
  • Gitignore-aware filtering and cross-platform CLI (Homebrew, curl, PowerShell) ship ready for immediate use.
Weaknesses
  • Chat context optimization is increasingly table stakes; Cursor, Continue, and Codebase Copilot already handle this natively.
  • No integration with actual LLM APIs or IDEs—you still manually copy-paste the output into your chat.
Target Audience

AI engineers and developers using LLM coding assistants with large codebases

Similar To

Cursor's codebase indexing · Continue.dev context selection · GitHub Copilot context window management

Post Description

I've been experimenting with ways to make AI coding assistants more efficient when working with large codebases.

The problem

When we give repository context to LLMs, we often send full files and implementations. But for many tasks (like understanding architecture or navigating a repo), the model doesn't actually need most of that.

This leads to two issues: - unnecessary token usage - noisy context

The idea

Instead of sharing the full implementation, what if we only shared the interface surface of the code?

Function signatures, types, imports, and documentation — basically the structure of the system rather than the implementation details.

The experiment

I built a small CLI tool called Brf.it to test this idea. It uses Tree-sitter to parse code and extract structural information.

Example output:

<file path="src/api.ts"> <function>fetchUser(id: string): Promise<User></function> <doc>Fetches user from API, throws on 404</doc> </file>

In one example from a repo, a ~50 token function compresses to about ~8 tokens when reduced to just its signature and documentation.

The goal isn't to replace sharing full code, but to provide a lightweight context layer for things like: - architecture understanding - repo navigation - initial prompt context for AI agents

Inspired partly by repomix, but with a different approach: instead of compressing the full repo, it extracts the API-level structure.

Language support so far: Go, TypeScript, JavaScript, Python, Rust, C, C++, Java, Swift, Kotlin, C#, Lua

Project: https://indigo-net.github.io/Brf.it/

Curious if others have tried similar approaches.

What information do you think is actually essential for LLM code understanding? Are function signatures + docs enough for architecture reasoning? Are there formats that work better for LLM consumption?

Similar Projects

AI/MLMid

Extract (financial) data from emails with local LLM

Local LLM email parsing when Plaid and receipt scanners already exist.

Ship It
brainless
103mo ago
AI/ML●●Solid

Mnemo – local-first AI memory layer for any LLM (Rust, SQLite,petgraph)

Knowledge graph memory beats pure vector search, but Mem0 and LangChain already own this space.

Big BrainShip It
zaydmulani
603011d ago