Crustdata (YC F24) – Web Search API for Token-Efficient AI Agents

Name: Crustdata (YC F24) – Web Search API for Token-Efficient AI Agents
Availability: InStock
Author: loondri

by loondri·Feb 25, 2026·10 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainSolve My Problem

Entity-mapped web search API cuts agent token waste; targets real Perplexity/Anthropic use case.

Strengths

•Canonical entity graph + continuous indexing solves a genuine AI pain (deduplication at search, not summarization)
•Real-time signals (job changes, funding rounds) turn raw data into actionable triggers for GTM

Weaknesses

•Closed-source pricing model; unclear if ROI beats calling raw API + doing dedup downstream
•No open-source alternative shown, so hard to judge differentiation against homegrown solutions

Post Description

Hi HN! We’re Abhilash Chowdhary, Chris Pisarski and Manmohit Grewal. We built Crustdata (YC F24). Today we’re launching our web search API for AI agents, which not only returns the most relevant documents from the web but also maps them to the correct entity (person, company or event). Demo video here https://youtu.be/IouWW97hBN8

If you run agents at scale, tokens become a line item. The web data is the worst input: long pages, repeated content, mixed entities, stale claims. The usual web search -> scrape -> summarize + structure forces the agent to spend tokens doing janitorial work before it can take action.

We’re trying to move that work upstream. We keep a canonical graph (ontology) of people and companies: stable internal IDs, aliases, and relationships. Then we continuously index the web and attach each document to the right entity ID. Example: raw web search for "Stripe pricing changes 2026" returns ~10 results across ~4,000 tokens, mostly redundant. We return 6 deduplicated results in ~1,200 tokens.

This is not just about saving tokens. It also matters because the common failure isn’t “search missed something.” It’s “search found something about the wrong entity.” Names collide. Companies rebrand. Domains move. Press releases get syndicated and look like independent sources. If you treat strings as IDs, you eventually attach evidence to the wrong person/company and the agent takes a confident action based on that mistake.

Under the hood, we run a continuous pipeline that updates the entity-linked index: discover -> fetch -> extract -> dedupe -> entity resolution -> attach -> index . And we serve you this index via our search API.

We didn’t start with web search. We spent ~2 years building verified people + company data from higher-trust sources. That forced us to build identity as a system, not a string. When we tried to bolt on web search and started building our integrated index of documents + people + companies, we ended up with a pile of local fixes: parser tweaks, domain rules, prompt hacks. Each fix helped one case and broke another because identity isn’t local. That’s when we committed to an entity-first index: pay the entity resolution cost once, then reuse it everywhere.

If you’re building AI agents for sales, recruiting, or investing that do a lot of web searches for people and companies, we’d love for you to try our web search APIs. https://crustdata.com/demo

Similar Projects

AI/ML●Mid

Id-agent – Token efficient UUID alternative for AI agents

Token-efficient word IDs for LLMs, but it's a narrow utility library.

Big BrainShip It

pranshuchittora

425525d ago

Developer Tools●●●Banger

Yuzudraw – visual editor for ASCII diagrams with token-efficient DSL

Monodraw alternative with agent integration that closed-source tools lack.

Big BrainNiche GemShip It

agavra

102mo ago

AI/ML●●Solid

OpenClacky – A token-efficient personal agent written in Ruby

94.9% cache hit rate cuts token costs, but AI agent wrappers are crowded.

SlickShip It

gemHunter

202d ago

AI/ML●●●Banger

Kanon 2 Enricher – the first hierarchical graphitization model

58-task-head model that extracts + links entities + maps doc hierarchy—no hallucinations like LLMs.

WizardryBig Brain

ubutler

1063mo ago

Developer Tools●●Solid

Lineark, CLI for Linear, hits 2.0

Cuts Linear MCP token usage from 13,000 to under 1,000 for agent tool descriptions.

Big BrainNiche Gem

fb03

103mo ago

AI/ML●●Solid

Sift-kg – Turn documents into knowledge graphs from the CLI

End-to-end and local-first: point it at PDFs or docs, and it extracts entities/relations with LLMs, proposes merges for you to approve in a terminal UI, then generates an interactive browser viewer and standard graph exports. The human-in-the-loop merge workflow and support for local providers (Ollama/LiteLLM) are smart, practical choices; just remember output quality and scale will still hinge on the LLM you pick.

Niche GemWizardry

juanceresa

134mo ago