I built a search engine for llms.txt sites

Name: I built a search engine for llms.txt sites
Availability: InStock
Author: durovilla

by durovilla·Apr 28, 2026·3 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemShip It

Google for llms.txt when no dedicated crawler existed before.

Strengths

•Site-scoped queries let you target specific docs like Stripe or Supabase.
•Free access with no API keys removes friction for quick lookups.
•CLI and MCP server integration enables agent-based workflows.

Weaknesses

•Search quality depends entirely on llms.txt adoption rates across projects.
•No clear differentiation from general search engines filtering by filetype.

Post Description

More and more developer tools are adopting the llms.txt standard to build AI-friendly versions of their docs. The problem is that it's very hard to search across them. So I crawled millions of webpages to build Statespace, a search engine for llms.txt sites. And it's free, no API keys required.

You can run plain queries to search across all sites:

mcp server setup vector database embeddings oauth2 refresh token

Or scope your queries to a specific site with "site: query"

stripe: webhook verification mistral.ai: function calling docs.supabase.com: edge functions auth

Quotes work like Google for exact phrases:

"context window limit" vector database "semantic search" stripe: "webhook signature verification"

This is still a work in progress, so looking for feedback and beta testers!

- Search for humans (website): https://statespace.com/

- Search for agents (CLI, SDK, MCP, skill): https://github.com/statespace-tech/statespace

Similar Projects

AI/ML●Mid

AgentGrade – agent-readiness guide for your site

Timely concept checking for /llms.txt, but it's just four HTTP GET requests.

Bold Bet

usiegj00

711mo ago

Productivity●●Solid

I treated my CV like a data product-evidence.json,MCP endpoint,llms.txt

Evidence-mapped CV beats PDF for AI recruiter parsing, but applies only to ATS that read these formats.

Big BrainShip It

vassilbek

103mo ago

Developer Tools●●Solid

Scan domain for llms.txt LLMs-full.txt AI aware SEO tool

Validates llms.txt and AI robot rules before AI crawlers ignore your content.

Niche GemShip It

fcpguru

501mo ago

Security●Mid

AgentCheck – AI bot posture leaderboard from robots.txt and llms.txt

Transparent AI bot posture tracking, but 'what bots touch you' is already solved by uBlock Origin.

Bold Bet

MK_Phoenix

113mo ago

Productivity●Mid

I rebuilt my CV site as a practical, machine-readable portfolio

This is a practical playbook: the repo bundles resume.json, evidence.json, availability.json, an agent‑card and an llms.txt plus CI checks and IndexNow pushes so your CV is both human- and agent-discoverable. Clever bits: automated sitemap/index pushes, link-checking Actions, and explicit A2A‑style metadata (agent‑card.json) — that’s not something you see on most personal sites. What’s missing for wider credibility are outcome metrics and external verification (recruiter-facing analytics, attestations, or an A/B test showing improved contacts), and a clearer signal-to-noise story for what recruiters should actually consume first.

Niche GemSolve My Problem

vassilbek

104mo ago

AI/ML●Mid

The Global Llms.txt Index

Searchable directory for llms.txt files when general search engines could index these.

Ship It

olex-green

2015d ago