MemOperator-4B
Specialized memory model beating GPT-4o-mini on locomo benchmarks while running locally.

Beats utility forecasts on 6 of 7 RTOs using only public EIA data and open models.
Energy analysts, data scientists, grid operators
EIA-930 · Grid forecasting tools · Energy market analytics platforms
On a 2025 hold-out (~61,000 hours), it beats the operators' own day-ahead submissions to EIA — the production forecasts they use to schedule generation — on 6 of 7 major RTOs. Macro MAE ~40% lower. The one loss is ISO-NE, whose forecasting is just very good (24h-ahead MASE 0.34). On the same window, CAISO and SPP operator submissions did worse than "same as yesterday."
The site plots the median + 80% PI band against the operator submission, with 48h of actuals running into the forecast.
Code, model on HF, operator-comparison benchmark reproduces from one script:
- https://github.com/tylergibbs1/surge - https://huggingface.co/Tylerbry1/surge-fm-v3
Specialized memory model beating GPT-4o-mini on locomo benchmarks while running locally.
Beats GPT-5 at golf forecasting via auto-labeled data pipeline; replicable recipe for any domain via SDK.
Unified memory trick lets a 2B model beat 12B; trains on MacBook with zero cloud costs.
Shard-based scheduling cuts GPU wait time, though Ray Tune offers similar early stopping.
Galaxy classification model, but model card has mostly empty fields.
Eval-synthesize-train loop automates custom model development better than manual fine-tuning.