Semble – Code search for agents that uses 98% fewer tokens than grep
Static Model2Vec embeddings beat transformer retrieval quality while running entirely on CPU.
Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read
Cuts agent token costs by 98% compared to grep without needing GPU inference.
Developers building AI coding agents or RAG pipelines
Greptile · Sourcegraph Cody · LlamaIndex
So we built Semble. It combines static Model2Vec embeddings (using our latest static model: potion-code-16M) with BM25, fused via RRF and reranked with code-aware signals. Everything runs on CPU since there's no transformers involved. On our benchmark of ~1250 query/document pairs across 63 repos and 19 languages, it uses 98% fewer tokens than grep+read and reaches 99% of the retrieval quality of a 137M-parameter code-trained transformer, while being ~200x faster.
Main features:
- Token-efficient: 98% fewer tokens than grep+read
- Fast: ~250ms to index a typical repo on our benchmark, ~1.5ms per query on CPU (very large repos may take longer)
- Accurate: 0.854 NDCG@10, 99% of the best transformer setup we tested
- MCP server: drop-in for Claude Code, Cursor, Codex, OpenCode
- Zero config: no API keys, no GPU, no external services
Install in Claude Code with: claude mcp add semble -s user -- uvx --from "semble[mcp]" semble
Or check our README for other installation instructions, benchmarks, and methodology:
Semble: https://github.com/MinishLab/semble
Benchmarks: https://github.com/MinishLab/semble/tree/main/benchmarks
Model: https://huggingface.co/minishlab/potion-code-16M
Let us know if you have any feedback or questions!
Static Model2Vec embeddings beat transformer retrieval quality while running entirely on CPU.
Semantic grep with word embeddings when traditional grep only does syntax.
Transparent benchmarks show 39% cost cuts — rare to see real numbers in AI tooling.
First language designed for LLMs to modify safely with capability permissions.
Referenced element indexing cuts token spend 3-10x versus DOM-dumping AI browsers.
MAKO compresses what matters into a HEAD-friendly payload — frontmatter, declared actions and semantic links — so agents can find relevance without downloading 181KB of navigation, ads and scripts. The project ships a spec plus real tooling (typed SDK, Express middleware, an analyzer/score and edge-friendly /md conversion), which is a rare combo of protocol thinking and usable developer ergonomics. Whether it becomes a standard depends on buy-in from CMS/plugin authors and agent platforms, but technically it's a smart, practical swing at an obvious pain point.