Back to browse
Open-source LLM and dataset for sports forecasting (Pro Golf)

Open-source LLM and dataset for sports forecasting (Pro Golf)

by bturtel·Feb 24, 2026·7 points·0 comments

AI Analysis

●●●BangerBig BrainZero to One

Beats GPT-5 at golf forecasting via auto-labeled data pipeline; replicable recipe for any domain via SDK.

Strengths
  • Future-as-Label data generation (search queries → auto-labeled questions from news) removes manual domain annotation bottleneck
  • Measurable lift vs GPT-5 (Brier Skill +17% vs +12.8%, ECE 6% vs 10.6%) on held-out test set with clear methodology
  • Reproducible architecture: SDK templates + GRPO + LoRA recipe lets anyone build domain-specific forecasters without ML expertise
Weaknesses
  • Evaluation on only 855 held-out questions (Aug 2025+) is narrow; generalization to other domains unproven, and golf may be particularity suited to news-driven labeling
  • Requires gpt-oss-120b base model; dependency on third-party foundation model limits positioning as standalone offering
Category
Target Audience

ML researchers, domain specialists building specialized forecasting models, sports analysts

Similar To

Hugging Face fine-tuning · OpenAI GPT-5 · Domain-specific model marketplaces

Post Description

Hey HN, I fine-tuned a small open-source model on golf forecasting and it beats GPT-5 at predicting golf outcomes. The same approach can be used to build a specialized model in any domain, you just need to update a few search queries.

We fine-tuned gpt-oss-120b with LoRA on 3,178 golf forecasting questions, using GRPO with Brier score as the reward.

Our model outperformed GPT-5 on Brier Skill (17% vs 12.8%) and ECE (6% vs 10.6%) on 855 held-out questions.

How to try it: the model and dataset are open-source, with code, on Hugging Face.

How to build your own specialized model: Update the search queries and instructions in the Lightning Rod SDK to generate a new forecasting dataset, then run the same GRPO + LoRA recipe.

SDK link: https://github.com/lightning-rod-labs/lightningrod-python-sd... Dataset: https://huggingface.co/datasets/LightningRodLabs/GolfForecas... Model: https://huggingface.co/LightningRodLabs/Golf-Forecaster

Questions, feedback on the SDK, suggestions for new domains to try this on - all are welcome.

Similar Projects