AI/ML●●●Banger
LLM Sycophancy Benchmark: Opposite-Narrator Contradictions
Opposite-narrator test catches models agreeing with both sides of same dispute.
Big BrainDark Horse
zone411
303mo ago

Fast.com for LLMs, but you could script this with curl in five minutes.
Developers evaluating LLM API providers
Artificial Analysis · LLM Perf Leaderboard · Chatbot Arena
Opposite-narrator test catches models agreeing with both sides of same dispute.
Research article revealing few-shot collapse patterns, not a usable tool or product.
RFC 3339 hits 88% accuracy while unix epoch fails 50% of the time.
Bufferbloat grading reveals Zoom lag you'd miss on Speedtest; Bubble Tea UI delight.
Yet another prompt benchmarking UI when Promptfoo and LangSmith already exist.
Intercepts HTTP transport layer so production code needs zero changes.