Back to browse
Steerling-8B, a language model that can explain any token it generates

Steerling-8B, a language model that can explain any token it generates

by adebayoj·Feb 24, 2026·328 points·91 comments

AI Analysis

●●●●GemWizardryZero to OneBig Brain

First LLM with per-token interpretability tracing input, concepts, and training provenance.

Strengths
  • Per-token attribution to input, concepts, and training data—genuinely novel interpretability layer.
  • Concept-level steering at inference time replaces safety fine-tuning, reducing iteration cost.
  • Performance within 2–7x scaling parity despite transparency overhead—validates architecture.
Weaknesses
  • Limited to 8B parameters; unclear scaling trajectory to 70B+ competitive with frontier models.
  • Training data attribution depends on accurate source labeling; methodology not fully detailed.
Category
Target Audience

ML researchers, AI safety engineers, interpretability practitioners

Similar Projects

Developer Tools●●Solid

Npx Claude-traces, visualizer for Claude Code/Agent SDK traces

Runs with one npx command and immediately surfaces a helpful timeline view with token counts, tool I/O panes and subagent nesting — exactly the sort of visibility you want when an agent goes off the rails. Cleverly reads the local ~/.claude/projects traces so setup is trivial, but its usefulness is limited by being Claude-only and local; add search/aggregation or a team-sharing mode and this jumps up a tier.

Niche GemSolve My ProblemSlick
hahawhatsgood
203mo ago