Security●●Solid
BreakMyAgent – Open-source red-teaming sandbox for LLM system prompts
LLM-as-Judge red-teaming for system prompts, but Anthropic/OpenAI already ship this internally.
Solve My ProblemShip It
breakmyagent
203mo ago

Autonomous exploit validation with real command execution is genuinely wild for a local tool.
Security researchers, penetration testers
Burp Suite · Metasploit · Nuclei
LLM-as-Judge red-teaming for system prompts, but Anthropic/OpenAI already ship this internally.
Genetic algorithms meet LLM personas to stress-test landing page copy.
Rediscovers Kepler's laws and GR equations from raw data without LLM hallucination.
Replaces stitching Langfuse and promptfoo together with one unified eval dashboard.
Searchable directory for llms.txt files when general search engines could index these.
Social deduction games test deception and theory of mind better than standard benchmarks.