Back to browse
GitHub Repository

Offline-first MDN Web Docs RAG-MCP server ready for semantic search with hybrid vector and full‑text retrieval

3 starsTypeScript

Offline-First MDN Web Docs RAG-MCP Server

by d-_-b·Apr 1, 2026·2 points·0 comments

AI Analysis

●●●BangerNiche GemBig Brain

Hybrid vector plus BM25 retrieval beats vector-only for technical docs.

Strengths
  • Pre-processed 50k+ MDN rows on HuggingFace saves users from building embeddings.
  • Hybrid retrieval combines semantic understanding with exact keyword matching.
  • npx installable with ~655 MB RAM footprint makes it trivial to integrate.
Weaknesses
  • MDN content changes, requiring dataset updates to stay current.
  • English-only dataset limits international developer adoption.
Target Audience

AI developers building agents that need documentation access

Similar To

LangChain · LlamaIndex · Sourcegraph Cody

Post Description

Hi.

While tinkering with RAG ideas I've thoroughly processed the entire MDN Web Docs original content, pre-ingested it into LanceDB, uploaded the 50k+ rows dataset (https://huggingface.co/datasets/deepsweet/mdn) to HuggingFace, and published a RAG-MCP server (https://github.com/deepsweet/mdn) ready for semantic search with hybrid vector (1024-d) and full‑text (BM25) retrieval.

A screenshot is worth a thousand words: https://raw.githubusercontent.com/deepsweet/mdn/main/example...

Similar Projects