Wolf Defender, a open-weight prompt-injection detection model
Outperforms existing open-source injection detectors on ProtectAI and Qualifire benchmarks.

100% sycophancy detection on Psychosis-Bench, runs locally on gaming GPU.
AI safety researchers, LLM developers, alignment engineers
It's small enough it can run on a gaming GPU locally. It's got a GGUF checkpoint on hugging face and is available on ollama. You can pull it and run scenarios against it in minutes: https://ollama.com/izzie/sycofact
The synthetic training data is also public, you can train other models over the data or reproduce my results. The labels were all generated by Gemma 3 27B with activation steering based on generated contrastive data. A write-up is planned at a later date, feel free to get in touch if curious.
Outperforms existing open-source injection detectors on ProtectAI and Qualifire benchmarks.
Native ternary training beats post-training quantization for memory efficiency.
Native multilingual training covers GDPR Article 9 categories others skip.
Detects sycophancy and jailbreak drift in LLMs without needing model weights.
Streams LLM weights from CD-ROM during inference to fit 77MB models in 32MB RAM.
Deterministic fingerprinting for model structure without loading weights.