Pytest tests catching Python supply chain attacks (litellm .pth vector)
Catches .pth injection vectors from the litellm attack when Snyk and Dependabot miss them.
Auto-patching LLM calls to inject faults and log telemetry is a neat technical trick that lets you fuzz real agent runs without changing your stack. The repo ships six intentionally vulnerable example agents and a CLI (discover/run/ci) with eval packs for security and resilience, so you can reproduce attacks and gate releases. It feels like an early, practical toolkit that fills a gap in agent security testing — adoption and more community-playbooks will determine how far it goes.
ML engineers, security engineers, SREs and backend developers shipping AI agents
The repo includes 6 intentionally vulnerable example agents (support bot, SQL agent, code executor, payment processor, API agent, document processor) with real attack scenarios showing exactly how they break. Try breaking them yourself.
Three commands to test your own agent:
- pip install khaos-agent - khaos discover - khaos run my-agent --pack security
It works with raw OpenAI/Anthropic, Gemini, LangGraph, CrewAI, AutoGen — any Python agent. Khaos auto-patches LLM calls to inject faults and log telemetry. No cloud needed, runs 100% locally.
Some of what it tests:
- Prompt injection (policy bypass, developer mode exploits) - Tool misuse (unauthorized DB writes, unscoped API calls) - Data exfiltration (PII extraction, credential leakage) - Fault injection (timeouts, rate limits, malformed tool responses)
We are the first platform that focuses on testing the Agent's environment, not just the model in the harness.
Plus 4 tutorials using the free Gemini API if you want to learn without spending anything. Repo: https://github.com/ExordexLabs/khaos-sdk Examples: https://github.com/ExordexLabs/khaos-examples BSD licensed. v1.0 just shipped — the attack library and framework adapters are growing. What agents are you most worried about breaking?
Catches .pth injection vectors from the litellm attack when Snyk and Dependabot miss them.
Streaming guardrail catches semantic PII that regex misses — based on real LangChain issues.
Replaces slow VM tests with 11-second LXD container spins using ZFS snapshots.
60-second game exposing AI permission fatigue before you blindly approve everything.
42 prompt injection patterns detected in under 10ms with zero dependencies.
Synthetic e-commerce site with failure injection beats flaky live-site testing.