Back to browse
GitHub Repository

Fast and Accurate Code Search for Agents. Uses ~98% fewer tokens than grep+read

5,035 starsPython

Semble – Code search for agents that uses 98% fewer tokens than grep

by Bibabomas·May 17, 2026·445 points·151 comments

AI Analysis

●●●BangerBig BrainSolve My Problem

Static Model2Vec embeddings beat transformer retrieval quality while running entirely on CPU.

Strengths
  • 98% token reduction directly addresses the cost bottleneck of agent code search.
  • Sub-second indexing on CPU removes the need for GPU infrastructure.
  • MCP server integration works immediately with Claude Code and Cursor.
Weaknesses
  • Benchmarks rely on a specific dataset of 63 repos; real-world variance unknown.
  • Static embeddings may struggle with highly dynamic or polyglot codebases.
Category
Target Audience

Developers building AI coding agents

Similar To

Sourcegraph Cody · grep · Embedchain

Post Description

Hey HN! We (Stephan and Thomas) recently open-sourced Semble. We kept running into the same problem while using Claude Code on large codebases: when the agent can't find something directly, it falls back to grep, reading full files or launching subagents. This uses a lot of tokens, and often still misses the relevant code. There are existing tools for this, but they were either too slow to index on demand, needed API keys, or had poor retrieval quality.

Semble is our solution for this. It combines static Model2Vec embeddings (using our latest static model: potion-code-16M) with BM25, fused via RRF and reranked with code-aware signals. Everything runs on CPU since there's no transformers involved. On our benchmark of ~1250 query/document pairs across 63 repos and 19 languages, it uses 98% fewer tokens than grep+read and reaches 99% of the retrieval quality of a 137M-parameter code-trained transformer, while being ~200x faster.

Main features:

- Token-efficient: 98% fewer tokens than grep+read

- Fast: ~250ms to index a typical repo on our benchmark, ~1.5ms per query on CPU (very large repos may take longer)

- Accurate: 0.854 NDCG@10, 99% of the best transformer setup we tested

- MCP server: drop-in for Claude Code, Cursor, Codex, OpenCode

- Zero config: no API keys, no GPU, no external services

Install in Claude Code with: claude mcp add semble -s user -- uvx --from "semble[mcp]" semble

Or check our README for other installation instructions, benchmarks, and methodology:

Semble: https://github.com/MinishLab/semble

Benchmarks: https://github.com/MinishLab/semble/tree/main/benchmarks

Model: https://huggingface.co/minishlab/potion-code-16M

Let us know if you have any feedback or questions!

Similar Projects

Developer Tools●●Solid

MAKO – Open protocol for LLM-optimized web content (93% fewer tokens)

MAKO compresses what matters into a HEAD-friendly payload — frontmatter, declared actions and semantic links — so agents can find relevance without downloading 181KB of navigation, ads and scripts. The project ships a spec plus real tooling (typed SDK, Express middleware, an analyzer/score and edge-friendly /md conversion), which is a rare combo of protocol thinking and usable developer ergonomics. Whether it becomes a standard depends on buy-in from CMS/plugin authors and agent platforms, but technically it's a smart, practical swing at an obvious pain point.

Big BrainSlick
juanisidoro
113mo ago