GitHub Repository

LLM benchmark and leaderboard for narrator-bias sycophancy, opposite-narrator contradictions, and judgment consistency.

37 stars

LLM Sycophancy Benchmark: Opposite-Narrator Contradictions

Name: LLM Sycophancy Benchmark: Opposite-Narrator Contradictions
Availability: InStock
Author: zone411

by zone411·Mar 10, 2026·3 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainDark Horse

Opposite-narrator test catches models agreeing with both sides of same dispute.

Strengths

•Strict metric counts sycophancy only when model sides with both opposing narrators
•Live leaderboard compares Gemini, GPT, Claude, and Grok
•Open-source repo with clear methodology and reproducible tests

Weaknesses

•Niche audience limited to AI safety researchers
•Benchmark scope focused only on narrator-bias contradictions

Similar Projects

AI/ML●●Solid

ErrataBench - A Proofreading Benchmark for LLMs

51 models, 1613 runs, $558 spent — finally proofreading benchmarks with real numbers.

Niche GemBig Brain

artursapek

301mo ago

AI/ML●●Solid

AI/ML benchmark for local LLM inference and XGBoost training on GPU/CPU

One-command benchmark suite comparing Ollama and XGBoost performance with a shared Streamlit dashboard.

Solve My ProblemNiche Gem

albedan

2019d ago

Developer Tools●●●Banger

AgentDX – Open-source linter and LLM benchmark for MCP servers

First linter + benchmark for MCP servers; catches vague schemas before LLMs pick wrong tools.

Solve My ProblemNiche GemBig Brain

yamarldfst

103mo ago

AI/ML●Mid

Tested 12 LLMs with few-shot examples

Research article revealing few-shot collapse patterns, not a usable tool or product.

Dark Horse

shuntaro-okuma

202mo ago

Data●●●Banger

We benchmarked 18 LLMs on OCR (7K+ calls) – cheaper models win

7,560 runs proving cheaper models beat expensive ones on production OCR tasks.

Big BrainSolve My Problem

TimoKerr

511mo ago

AI/ML●●●Banger

Datetime-bench: which datetime formats LLMs get right (and wrong)

RFC 3339 hits 88% accuracy while unix epoch fails 50% of the time.

Solve My ProblemDark Horse

diwank

212mo ago