GitHub Repository

Offline-first MDN Web Docs RAG-MCP server ready for semantic search with hybrid vector and full‑text retrieval

3 starsTypeScript

Offline-First MDN Web Docs RAG-MCP Server

Name: Offline-First MDN Web Docs RAG-MCP Server
Availability: InStock
Author: d-_-b

by d-_-b·Apr 1, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerNiche GemBig Brain

Hybrid vector plus BM25 retrieval beats vector-only for technical docs.

Strengths

•Pre-processed 50k+ MDN rows on HuggingFace saves users from building embeddings.
•Hybrid retrieval combines semantic understanding with exact keyword matching.
•npx installable with ~655 MB RAM footprint makes it trivial to integrate.

Weaknesses

•MDN content changes, requiring dataset updates to stay current.
•English-only dataset limits international developer adoption.

Post Description

Hi.

While tinkering with RAG ideas I've thoroughly processed the entire MDN Web Docs original content, pre-ingested it into LanceDB, uploaded the 50k+ rows dataset (https://huggingface.co/datasets/deepsweet/mdn) to HuggingFace, and published a RAG-MCP server (https://github.com/deepsweet/mdn) ready for semantic search with hybrid vector (1024-d) and full‑text (BM25) retrieval.

A screenshot is worth a thousand words: https://raw.githubusercontent.com/deepsweet/mdn/main/example...