SycoFact 4B: Open model detecting sycophancy and delusion confirmation

Name: SycoFact 4B: Open model detecting sycophancy and delusion confirmation
Availability: InStock
Author: iwalton3

by iwalton3·Mar 30, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●●●GemBig BrainZero to OneWizardry

100% sycophancy detection on Psychosis-Bench, runs locally on gaming GPU.

Strengths

•Solves genuine AI safety failure mode that wasn't tractable before.
•Public training data and GGUF checkpoint enable reproducibility and fine-tuning.
•Competitive with GPT-4 on safety benchmarks at 1/50th the parameter count.

Weaknesses

•Niche audience limits adoption outside AI safety and alignment research.
•Write-up planned but not yet published for full methodology transparency.

Post Description

I published a model you can use now to help detect sycophantic AI responses before they harm users. It rejects 100% of the sycophantic delusion affirming responses from psychosis-bench. It also does well on the AISI Harmful Advice, PKU-SafeRLHF, and safety subsets of RewardBench.

It's small enough it can run on a gaming GPU locally. It's got a GGUF checkpoint on hugging face and is available on ollama. You can pull it and run scenarios against it in minutes: https://ollama.com/izzie/sycofact

The synthetic training data is also public, you can train other models over the data or reproduce my results. The labels were all generated by Gemma 3 27B with activation steering based on generated contrastive data. A write-up is planned at a later date, feel free to get in touch if curious.