Reduce LLM token use by ~30% with this MCP/CLI tool(Claude benchmarked)
Token-efficient code indexing with adaptive callers tracing cuts Claude costs by 34%.
The MCP developer toolkit. Scaffold, lint, test, benchmark, and publish MCP servers.
First linter + benchmark for MCP servers; catches vague schemas before LLMs pick wrong tools.
MCP server developers, AI agent builders
ESLint (linting philosophy) · Anthropic MCP specification validators
AgentDX is a CLI that measures this. Two commands:
- `npx agentdx lint` — static analysis of tool descriptions, schemas, and naming. 18 rules, zero config, no API key. Produces a lint score.
- `npx agentdx bench` — sends your tool definitions to an LLM (Anthropic, OpenAI, or Ollama) and evaluates tool selection accuracy, parameter correctness, ambiguity handling, multi-tool orchestration, and error recovery. Produces an Agent DX Score (0-100).
It auto-detects the server entry point, spawns it, connects as an MCP client, and reads tools via the protocol. Bench auto-generates test scenarios from your tool definitions.
Built in TypeScript, MIT licensed. Early alpha — the bench command works but is slow (sequential LLM calls, parallelization is next). Feedback welcome.
Token-efficient code indexing with adaptive callers tracing cuts Claude costs by 34%.
Five-LLM consensus catches prompt injection patterns static analysis misses.
Anonymous LLM feedback loop for MCP servers — telemetry without user effort.
Offline schema snapshots keep AI agents from wrecking your production database.
Offline version-accurate API queries beat fetching docs from remote servers.
Sandboxed MCP server lets LLMs run Ghidra and Radare2 without blowing up your host.