I logged Gemini's stock predictions for 38 days to study LLM drift
Rigorous 38-day Gemini drift study with citation-mapped predictions and confidence scores.

Names compression-step hallucination, but it's a paper not a tool you can use.
ML researchers, VLM developers, AI safety researchers
Chain of Thought papers · LangChain tracing · Anthropic constitutional AI research
Rigorous 38-day Gemini drift study with citation-mapped predictions and confidence scores.
62k puzzle benchmark reveals reasoning depth, cost variance, and stark US vs China model gaps.
Daily arXiv scraping with Claude classification beats manual curation.
Mafia-as-benchmark with learning-between-batches mechanism; public, inspectable sessions.
Document parsing A/B test arena with ELO ranking—niche but real alternative to OCR Arena.
MuJoCo physics meets Gemini reasoning entirely inside your browser tab.