Probus, AI vuln scanner (PRs merged in Vercel AI SDK, n8n, LangGraph)

Name: Probus, AI vuln scanner (PRs merged in Vercel AI SDK, n8n, LangGraph)
Availability: InStock
Author: etairl

by etairl·May 5, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerDark HorseBig Brain

Found real bugs in n8n and Vercel AI SDK using a three-agent verification pipeline.

Strengths

•QA agent independently verifies findings to reject false positives.
•Proven track record with merged PRs in major projects like LangGraph.
•Cost-effective scanning using open models via OpenRouter.

Weaknesses

•Relies on external LLM providers, introducing latency and cost variables.
•Scope limited to static code analysis without runtime behavior testing.

Post Description

Hi HN, I've been running this on my own dependency tree for the past few months. Probus is a vulnerability scanner that uses three agents. One picks the files worth deep-scanning. One writes raw findings. The third reads the code on its own and rejects any finding that doesn't have a real attack vector. While building it I pointed it at projects I use day to day. Bugs that came out of this and got reported as PRs:

n8n: password-reset JWTs being logged at debug level (n8n-io/n8n#29405) Vercel AI SDK: role: "system" injection in createAgentUIStream, a runtime schema bypass in ToolLoopAgent, and a prototype-property collision in getMediaTypeFromUrl (vercel/ai#14749, #14750 merged, #14751 merged) LangGraph.js: NoSQL injection in MongoDBSaver via unvalidated thread_id / checkpoint_ns / checkpoint_id types (langchain-ai/langgraphjs#2353) browser-use: path traversal in remote-fetched templates.json fields (browser-use/browser-use#4777) Haystack: SSRF and arbitrary file read via unrestricted OpenAPI $ref resolution, path traversal in the image converter, and unbounded HTTP body reads in LinkContentFetcher (deepset-ai/haystack#11226, #11228, #11229)

The false positive rate got low enough that I'd rather have other people running it than keep it private, so it's now public under Apache 2.0. How it works:

Analyst (1 LLM call): reads the repo and picks 50 to 500 files to deep-scan based on entry points, third-party surface, and dangerous sinks. Researcher (per file): walks call chains and writes raw findings. QA (per file): re-reads the code against each claim with no access to the researcher's reasoning, and rejects anything that doesn't have a real attack vector. Keeping the QA agent isolated from the researcher is what got noise under control. If it sees the researcher's reasoning, it just agrees with it.

Each agent runs in its own query() session through the Claude Agent SDK with a filesystem sandbox scoped to the target repo. Cost is tuned for open models. About $0.50 per file with Qwen 3.6 plus DeepSeek v4 Pro on OpenRouter. OpenAI is around 2.5x that. Anthropic is around 10x. npm install -g probus probus scan ./my-app Things I'd like feedback on:

The QA prompt took the most iteration. Happy to walk through it if anyone is working on similar verifier-agent patterns. I want to publish a public benchmark against a vulhub-style corpus. Suggestions on which repos to run it against would be helpful. The analyst step is a single LLM call right now. On large monorepos it sometimes misses things. Thinking about a hierarchical version.

https://github.com/etairl/Probus