GitHub Repository

Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read

5,035 starsPython

Semble – Code search for agents that uses 98% fewer tokens than grep

Name: Semble – Code search for agents that uses 98% fewer tokens than grep
Availability: InStock
Author: Bibabomas

by Bibabomas·May 17, 2026·445 points·151 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainSolve My Problem

Static Model2Vec embeddings beat transformer retrieval quality while running entirely on CPU.

Strengths

•98% token reduction directly addresses the cost bottleneck of agent code search.
•Sub-second indexing on CPU removes the need for GPU infrastructure.
•MCP server integration works immediately with Claude Code and Cursor.

Weaknesses

•Benchmarks rely on a specific dataset of 63 repos; real-world variance unknown.
•Static embeddings may struggle with highly dynamic or polyglot codebases.

Post Description

Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something directly, it falls back to grep, reading full files or launching subagents. This uses a lot of tokens, and often still misses the relevant code. There are existing tools for this, but they were either too slow to index on demand, needed API keys, or had poor retrieval quality.

Semble is our solution for this. It combines static Model2Vec embeddings (using our latest static model: potion-code-16M) with BM25, fused via RRF and reranked with code-aware signals. Everything runs on CPU since there's no transformers involved. On our benchmark of ~1250 query/document pairs across 63 repos and 19 languages, it uses 98% fewer tokens than grep+read and reaches 99% of the retrieval quality of a 137M-parameter code-trained transformer, while being ~200x faster.

Main features:

- Token-efficient: 98% fewer tokens than grep+read

- Fast: ~250ms to index a typical repo on our benchmark, ~1.5ms per query on CPU (very large repos may take longer)

- Accurate: 0.854 NDCG@10, 99% of the best transformer setup we tested

- MCP server: drop-in for Claude Code, Cursor, Codex, OpenCode

- Zero config: no API keys, no GPU, no external services

Install in Claude Code with: claude mcp add semble -s user -- uvx --from "semble[mcp]" semble

Or check our README for other installation instructions, benchmarks, and methodology:

Semble: https://github.com/MinishLab/semble

Benchmarks: https://github.com/MinishLab/semble/tree/main/benchmarks

Model: https://huggingface.co/minishlab/potion-code-16M

Let us know if you have any feedback or questions!

Similar Projects

Developer Tools●●●Banger

Semble – Code search for agents that uses 98% fewer tokens than grep

Cuts agent token costs by 98% compared to grep without needing GPU inference.

Big BrainSolve My ProblemShip It

stephantul

801mo ago

Developer Tools●●Solid

ngrep – grep plus word embeddings

Semantic grep with word embeddings when traditional grep only does syntax.

Big BrainWizardry

xnan

323mo ago

Developer Tools●●●●Gem

The Mog Programming Language

First language designed for LLMs to modify safely with capability permissions.

Zero to OneWizardryBig Brain

belisarius222

163833mo ago

Developer Tools●●●Banger

Tappi Browser – Fastest AI browser, 3-10x fewer tokens, zero telemetry

Referenced element indexing cuts token spend 3-10x versus DOM-dumping AI browsers.

WizardryBig BrainShip It

shaihazher

103mo ago

Developer Tools●●Solid

MAKO – Open protocol for LLM-optimized web content (93% fewer tokens)

MAKO compresses what matters into a HEAD-friendly payload — frontmatter, declared actions and semantic links — so agents can find relevance without downloading 181KB of navigation, ads and scripts. The project ships a spec plus real tooling (typed SDK, Express middleware, an analyzer/score and edge-friendly /md conversion), which is a rare combo of protocol thinking and usable developer ergonomics. Whether it becomes a standard depends on buy-in from CMS/plugin authors and agent platforms, but technically it's a smart, practical swing at an obvious pain point.

Big BrainSlick

juanisidoro

113mo ago

AI/ML●●Solid

Open-Source Knowledge Agents Template

Swaps vector databases for filesystem grep to cut costs and improve traceability.

Big BrainShip It

flashbrew

312mo ago