GitHub Repository

Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read

5,138 starsPython

Semble – Code search for agents that uses 98% fewer tokens than grep

Name: Semble – Code search for agents that uses 98% fewer tokens than grep
Availability: InStock
Author: stephantul

by stephantul·May 3, 2026·8 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainSolve My ProblemShip It

Cuts agent token costs by 98% compared to grep without needing GPU inference.

Strengths

•Static Model2Vec embeddings eliminate GPU dependency for indexing and search.
•MCP server integration works immediately with Claude Code and Cursor agents.
•Benchmarks show 99% retrieval quality of 137M-param transformers at lower cost.

Weaknesses

•Niche agent focus limits utility for human developers searching code manually.
•Relies on potion-code-16M model which may lag behind latest transformer architectures.

Post Description

Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something directly, it falls back to grep, reading full files or launching subagents. This uses a lot of tokens, and often still misses the relevant code. There are existing tools for this, but they were either too slow to index on demand, needed API keys, or had poor retrieval quality.

So we built Semble. It combines static Model2Vec embeddings (using our latest static model: potion-code-16M) with BM25, fused via RRF and reranked with code-aware signals. Everything runs on CPU since there's no transformers involved. On our benchmark of ~1250 query/document pairs across 63 repos and 19 languages, it uses 98% fewer tokens than grep+read and reaches 99% of the retrieval quality of a 137M-parameter code-trained transformer, while being ~200x faster.

Main features:

- Token-efficient: 98% fewer tokens than grep+read

- Fast: ~250ms to index a typical repo on our benchmark, ~1.5ms per query on CPU (very large repos may take longer)

- Accurate: 0.854 NDCG@10, 99% of the best transformer setup we tested

- MCP server: drop-in for Claude Code, Cursor, Codex, OpenCode

- Zero config: no API keys, no GPU, no external services

Install in Claude Code with: claude mcp add semble -s user -- uvx --from "semble[mcp]" semble

Or check our README for other installation instructions, benchmarks, and methodology:

Semble: https://github.com/MinishLab/semble

Benchmarks: https://github.com/MinishLab/semble/tree/main/benchmarks

Model: https://huggingface.co/minishlab/potion-code-16M

Let us know if you have any feedback or questions!

Similar Projects

AI/ML●●●Banger

Semble – Code search for agents that uses 98% fewer tokens than grep

Static Model2Vec embeddings beat transformer retrieval quality while running entirely on CPU.

Big BrainSolve My Problem

Bibabomas

44515127d ago

Developer Tools●●Solid

ngrep – grep plus word embeddings

Semantic grep with word embeddings when traditional grep only does syntax.

Big BrainWizardry

xnan

323mo ago

Developer Tools●●Solid

Save Claude tokens with semantic search powered by SQLite and Ollama

Transparent benchmarks show 39% cost cuts — rare to see real numbers in AI tooling.

Big BrainSolve My Problem

illogicalabc

812mo ago

Developer Tools●●●●Gem

The Mog Programming Language

First language designed for LLMs to modify safely with capability permissions.

Zero to OneWizardryBig Brain

belisarius222

163833mo ago

Developer Tools●●●Banger

Tappi Browser – Fastest AI browser, 3-10x fewer tokens, zero telemetry

Referenced element indexing cuts token spend 3-10x versus DOM-dumping AI browsers.

WizardryBig BrainShip It

shaihazher

103mo ago

Developer Tools●●Solid

MAKO – Open protocol for LLM-optimized web content (93% fewer tokens)

MAKO compresses what matters into a HEAD-friendly payload — frontmatter, declared actions and semantic links — so agents can find relevance without downloading 181KB of navigation, ads and scripts. The project ships a spec plus real tooling (typed SDK, Express middleware, an analyzer/score and edge-friendly /md conversion), which is a rare combo of protocol thinking and usable developer ergonomics. Whether it becomes a standard depends on buy-in from CMS/plugin authors and agent platforms, but technically it's a smart, practical swing at an obvious pain point.

Big BrainSlick

juanisidoro

113mo ago