Back to browse
GitHub Repository

Empirical study: layered retrieval (typed→semantic→grep) scores 0.954 for LLM-generated engineering artifacts. 5 conditions, 3 model tiers, 36 generated ADRs, 23 score files.

3 starsPython

Layered retrieval beats grep alone for LLM-generated engineering docs

by rduffyuk·May 26, 2026·3 points·0 comments

AI Analysis

●●SolidBig BrainNiche Gem

Layered retrieval beats semantic search alone for engineering docs, saving 5x model costs.

Strengths
  • Counter-intuitive finding: semantic search alone scores lower than grep baseline.
  • Dual-judge validation ensures scoring rubric isn't just self-reported noise.
  • Sonnet layered retrieval matches Opus performance at one-fifth the cost.
Weaknesses
  • ADR-specific benchmark may not generalize to broader codebase retrieval tasks.
  • No reusable library provided, just data and scripts for reproduction.
Category
Target Audience

AI engineers building internal RAG systems

Similar To

Ragas · Arize Phoenix · TruLens

Similar Projects

Open Source●●Solid

EasyMemory – 100% local memory layer and MCP for LLMs

Hooks into MCP (Claude Desktop, Ollama, etc.) and keeps everything on disk — auto-saved chats, Slack/Notion imports, and file ingestion make it useful right away for local-agent workflows. The hybrid retrieval combo (graph + vector + keyword) without requiring an external vector DB is an interesting engineering choice, but the space is crowded and I want benchmarks and failure-mode details before recommending it for production.

Niche GemShip It
justvugg
203mo ago