GitHub Repository

The MCP developer toolkit. Scaffold, lint, test, benchmark, and publish MCP servers.

4 starsTypeScript

AgentDX – Open-source linter and LLM benchmark for MCP servers

Name: AgentDX – Open-source linter and LLM benchmark for MCP servers
Availability: InStock
Author: yamarldfst

by yamarldfst·Feb 18, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerSolve My ProblemNiche GemBig Brain

First linter + benchmark for MCP servers; catches vague schemas before LLMs pick wrong tools.

Strengths

•Auto-spawns MCP server, connects as client, evaluates real tool definitions—no manual setup.
•Dual-mode approach: fast static lint (free) + LLM bench (weighted scoring on actual tool use).
•Addresses real pain: most MCP servers ship with vague descriptions, causing silent LLM failures.

Weaknesses

•Early alpha; bench command is sequential, slow (parallelization promised but not shipped).
•Narrow audience: only relevant if you're building or using MCP servers (young ecosystem).

Post Description

MCP servers are proliferating fast, but most have vague tool descriptions and incomplete schemas that make LLMs pick the wrong tool or fill parameters incorrectly.

AgentDX is a CLI that measures this. Two commands:

- `npx agentdx lint` — static analysis of tool descriptions, schemas, and naming. 18 rules, zero config, no API key. Produces a lint score.

- `npx agentdx bench` — sends your tool definitions to an LLM (Anthropic, OpenAI, or Ollama) and evaluates tool selection accuracy, parameter correctness, ambiguity handling, multi-tool orchestration, and error recovery. Produces an Agent DX Score (0-100).

It auto-detects the server entry point, spawns it, connects as an MCP client, and reads tools via the protocol. Bench auto-generates test scenarios from your tool definitions.

Built in TypeScript, MIT licensed. Early alpha — the bench command works but is slow (sequential LLM calls, parallelization is next). Feedback welcome.