SurvivalIndex – which developer tools do AI agents choose?

Name: SurvivalIndex – which developer tools do AI agents choose?
Availability: InStock
Author: scalefirst

by scalefirst·Mar 7, 2026·1 point·3 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainNiche Gem

Measures what agents *actually pick*, not just capability—reveals tool blindness Claude misses.

Strengths

•Novel failure mode detection: agents can use tools but don't reach for them—orthogonal to BFCL benchmarks.
•Structured methodology with human coefficient variable, AAS scoring transparency; methodology page provided.
•Genuine empirical data: running standardized repos with natural-language prompts, no priming.

Weaknesses

•Only 33 tools tracked, only 6 marked as 'hidden gems'—sample too small to claim survivorship patterns.
•No evidence agents were tested across recent tool versions; methodology doesn't specify Claude version dates.
•Leaderboard lacks confidence intervals or statistical significance; human rater agreement/disagreement not shown.

Post Description

We've been running coding agents against standardized repos with natural-language prompts — no tool names, no hints — and measuring what they actually choose.

Early finding: Claude Code picks Custom/DIY in 12 of 20 categories. Not because it can't use the tools (BFCL scores suggest it can) but because it doesn't reach for them. That's a different failure mode than capability benchmarks measure.

We score each tool on: agent visibility, pick rate vs Custom/DIY, cross-context breadth, expert human ratings, and implementation success rate. Tools above survival=1 persist. Below it, agents synthesize around them.

Methodology is at survivalindex.org/methodology. Very curious what people think of the measurement approach, especially the human coefficient variable.

Similar Projects

Developer Tools●●Solid

Ambits – Claude Code agent coverage tooling

Tails Claude Code's JSONL and paints every function/struct/class by read-depth (unseen → name-only → full body) in a live terminal tree — plus automatic staleness marking when files change. The multi-agent tracking and optional Serena LSP backend are smart touches that make this more than a neat demo: it's practical observability for agent-driven workflows, though it's tightly coupled to Claude/Serena ecosystems.

Niche GemWizardry

joshLong145

103mo ago

Developer Tools●Mid

Codingagents.md – The open directory for AI coding agents

This is the kind of curated index I wish existed yesterday: agent pages, config format examples, SDK links and two named protocols (MCP/ACP) all collected in one place, plus a weekly-ranked table of models with context-length notes. It feels like real curation rather than linkspam, but the site leans on lists and scores — show the benchmark methodology, reproducible tests or interactive demos and the rankings would become trustable rather than just convenient.

Solve My ProblemNiche Gem

meame2010

544mo ago

Security●●Solid