Back to browse
I built a search engine for llms.txt sites

I built a search engine for llms.txt sites

by durovilla·Apr 28, 2026·3 points·0 comments

AI Analysis

●●SolidSolve My ProblemShip It

Google for llms.txt when no dedicated crawler existed before.

Strengths
  • Site-scoped queries let you target specific docs like Stripe or Supabase.
  • Free access with no API keys removes friction for quick lookups.
  • CLI and MCP server integration enables agent-based workflows.
Weaknesses
  • Search quality depends entirely on llms.txt adoption rates across projects.
  • No clear differentiation from general search engines filtering by filetype.
Target Audience

AI developers and technical writers

Similar To

Perplexity · Phind · Google

Post Description

More and more developer tools are adopting the llms.txt standard to build AI-friendly versions of their docs. The problem is that it's very hard to search across them. So I crawled millions of webpages to build Statespace, a search engine for llms.txt sites. And it's free, no API keys required.

You can run plain queries to search across all sites:

mcp server setup vector database embeddings oauth2 refresh token

Or scope your queries to a specific site with "site: query"

stripe: webhook verification mistral.ai: function calling docs.supabase.com: edge functions auth

Quotes work like Google for exact phrases:

"context window limit" vector database "semantic search" stripe: "webhook signature verification"

This is still a work in progress, so looking for feedback and beta testers!

- Search for humans (website): https://statespace.com/

- Search for agents (CLI, SDK, MCP, skill): https://github.com/statespace-tech/statespace

Similar Projects

Productivity●●Solid

I treated my CV like a data product-evidence.json,MCP endpoint,llms.txt

Evidence-mapped CV beats PDF for AI recruiter parsing, but applies only to ATS that read these formats.

Big BrainShip It
vassilbek
103mo ago

I rebuilt my CV site as a practical, machine-readable portfolio

This is a practical playbook: the repo bundles resume.json, evidence.json, availability.json, an agent‑card and an llms.txt plus CI checks and IndexNow pushes so your CV is both human- and agent-discoverable. Clever bits: automated sitemap/index pushes, link-checking Actions, and explicit A2A‑style metadata (agent‑card.json) — that’s not something you see on most personal sites. What’s missing for wider credibility are outcome metrics and external verification (recruiter-facing analytics, attestations, or an A/B test showing improved contacts), and a clearer signal-to-noise story for what recruiters should actually consume first.

Niche GemSolve My Problem
vassilbek
104mo ago