GitHub Repository

Open, model-agnostic benchmark for prompt-injection detectors — scored on both axes (attack catch-rate and false positives on real traffic), threshold-agnostic, and reproducible from raw scores.

0 starsPython

An open source benchmark for prompt-injection detectors

Name: An open source benchmark for prompt-injection detectors
Availability: InStock
Author: gugit

by gugit·Jun 29, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidBig Brain

Dual-axis measurement comparing detectors at same catch rate, not arbitrary thresholds.

Strengths

•Measures false positives on real traffic, not just attack detection rates
•Threshold-agnostic comparison prevents gaming via tuned cutoff points
•Author discloses commercial interest and includes their own model's weaknesses

Weaknesses

•Zero stars and fresh repo means no community adoption yet
•Benchmark maintained by detector vendor creates inherent conflict of interest

Similar Projects

Security●●●Banger

Prompt injection detector beats ProtectAI by 19% accuracy, 8.9x smaller

Beats ProtectAI by 19% accuracy and runs 9x smaller on CPU.

Dark HorseSolve My Problem

Karan047

322mo ago

Security●●Solid

Wolf Defender, a open-weight prompt-injection detection model

Outperforms existing open-source injection detectors on ProtectAI and Qualifire benchmarks.

Niche GemDark Horse

patronusprotect

203mo ago

Security●●●Banger

Deep-XPIA – Prompt injection benchmark for multi-agent AI systems

Maps cross-agent injection attacks to real Copilot CVEs with live measurements.

Big BrainDark Horse

leo_agent

3014d ago

AI/ML●●●Banger

AI image models hallucinate history, we built a method to fix it it

Naive prompts hallucinate history; structured knowledge injection raises accuracy from 12.5% to 83.3%.

Big BrainWizardrySolve My Problem

MysticBirdie

123mo ago

AI/ML●Mid

SeeVideo A web-first workspace to benchmark Seedance 2.0 vs. Kling 3.0

Third-party hub for Seedance 2.0 vs. Kling 3.0 side-by-side comparison when models are scattered across apps.

Eye Candy

naxtsass

103mo ago

AI/ML●●●Banger

AWB – Benchmark that tests your AI coding workflow, not just the model

Tests workflow + tool + model together, not just model capability like SWE-bench.

Big BrainZero to One

xmpuspus

103mo ago