Back to browse
Trained an LLM to predict "What will Trump do?"

Trained an LLM to predict "What will Trump do?"

by bturtel·Feb 20, 2026·10 points·2 comments

AI Analysis

●●SolidBig BrainNiche Gem

Beats GPT-5 on calibration via GRPO with auto-labeled news data.

Strengths
  • Fully automated pipeline: search query → labeled dataset → RL training with zero manual annotation
  • Strong evaluation methodology (Brier score, BSS, ECE) proves calibration edge, not just accuracy
  • Open source model and dataset immediately reproducible; domain-agnostic approach scales to any forecasting task
Weaknesses
  • Narrow use case: forecasting Trump actions lacks broad applicability despite claims of generalizability
  • Only 2,108 training questions on one person—unclear how method performs on wider political figures or domains
Category
Target Audience

ML researchers, forecasting enthusiasts, policy analysts

Similar To

Metaculus · Kalshi · Samotsvety Forecasts

Post Description

Hey HN! I RL-tuned an open-source LLM (gpt-oss-120b — 120B MoE, but only 5.1B active params) to predict "What will Trump do?" in any situation, trained on nothing but public news collected automatically from search queries. The trained model beats GPT-5, and both dataset and trained model are open sourced.

Data generation: Generated 2,108 binary forecasting questions from just a search query and a date range using the Lightning Rod SDK (https://github.com/lightning-rod-labs/lightningrod-python-sd...). Questions are generated from historic news articles — like "Will Trump impose 25% tariffs on Mexico by March 1?" — and resolved by checking what actually happened after the deadline. No human annotation — the whole pipeline is automated.

Training: GRPO with Brier score as the reward signal. LoRA rank 32, 50 training steps.

Results: Slight accuracy edge over GPT-5 (Brier 0.194 vs 0.200), but big gains in calibration — the RL-tuned model produces much better probabilities (ECE 0.079 vs 0.091).

Dataset: https://huggingface.co/datasets/LightningRodLabs/WWTD-2025

This is a fully automated way to spin up domain expert LLMs from public web data with just a few search queries, no labeling/annotation required.

I’d love any feedback, or suggestions for what domain expert to train next!

Similar Projects

AI/MLMid

100% LLM accuracy–no fine-tuning, JSON only

Ancient Rome Q&A benchmark shows 81pp accuracy lift, but lacks adversarial defense evidence.

Big Brain
MysticBirdie
223mo ago