Back to browse
SycoFact 4B: Open model detecting sycophancy and delusion confirmation

SycoFact 4B: Open model detecting sycophancy and delusion confirmation

by iwalton3·Mar 30, 2026·2 points·0 comments

AI Analysis

●●●●GemBig BrainZero to OneWizardry

100% sycophancy detection on Psychosis-Bench, runs locally on gaming GPU.

Strengths
  • Solves genuine AI safety failure mode that wasn't tractable before.
  • Public training data and GGUF checkpoint enable reproducibility and fine-tuning.
  • Competitive with GPT-4 on safety benchmarks at 1/50th the parameter count.
Weaknesses
  • Niche audience limits adoption outside AI safety and alignment research.
  • Write-up planned but not yet published for full methodology transparency.
Category
Target Audience

AI safety researchers, LLM developers, alignment engineers

Post Description

I published a model you can use now to help detect sycophantic AI responses before they harm users. It rejects 100% of the sycophantic delusion affirming responses from psychosis-bench. It also does well on the AISI Harmful Advice, PKU-SafeRLHF, and safety subsets of RewardBench.

It's small enough it can run on a gaming GPU locally. It's got a GGUF checkpoint on hugging face and is available on ollama. You can pull it and run scenarios against it in minutes: https://ollama.com/izzie/sycofact

The synthetic training data is also public, you can train other models over the data or reproduce my results. The labels were all generated by Gemma 3 27B with activation steering based on generated contrastive data. A write-up is planned at a later date, feel free to get in touch if curious.

Similar Projects