LLMtary – Local LLM Red-Teaming Tool
Autonomous exploit validation with real command execution is genuinely wild for a local tool.

Multi-agent AI chains real exploits with a judge that kills false positives—two hours, not weeks.
Security teams, startups, and enterprises needing fast pentesting at a fraction of traditional costs
Burp Suite · Nessus · Acunetix
I built Cipher to fix that. It's an AI agent that reasons like an attacker — maps the target, finds vulnerabilities, chains them into exploits, and proves they're real. Every finding ships with a reproducible Python script. If the script doesn't break your system, we don't report it.
How it works: Cipher defines security invariants ("User A can't access User B's data"), then multiple agents attack in parallel to violate them. A separate judge agent tries to disprove every finding — if it can't reproduce the exploit 3 times, the finding dies. You never see it.
$999 per assessment. Results in ~2 hours. Unlimited retesting.
Honest limitations: complex multi-step auth flows (SSO with MFA) still need manual setup like providing JWT credentials. We're working on it.
I'll run Cipher free for the first 15 HN readers who want to try it. Drop your email or sign up at https://apxlabs.ai/. Happy to answer any questions about the approach.
Autonomous exploit validation with real command execution is genuinely wild for a local tool.
The UI turns complex attack chains into an immediately scannable graph with per-path metrics (risk score, time-to-compromise, assets/credentials impacted) — great for threat modeling and tabletop drills. Feels more like a very polished BAS visualization than a novel research tool; what I want to know next is where the simulation inputs come from (real telemetry, vulnerability feeds, or canned scenarios).
CTF-style flags for voice prompt injection make learning LLM security actually fun.
ZK proofs verify exploit possession without leaking details until a time-locked deadline.
Delegation chains with accumulating caveats narrow authority at each agent hop.
NPM supply chain scanner competing against Socket, Snyk, and npm audit.