Back to browse
"Be horse." – a diffusion language model on an M2 Air

"Be horse." – a diffusion language model on an M2 Air

by encrux·Apr 30, 2026·10 points·2 comments

AI Analysis

●●SolidBig BrainNiche Gem

Parallel token decoding beats autoregressive LLMs on throughput, if the math holds up.

Strengths
  • Mask-prob as a model input lets one network handle all noise levels elegantly.
  • Training loop visualizations clearly explain the cross-entropy loss on masked positions.
  • Runs entirely on consumer M2 hardware without needing enterprise GPU clusters.
Weaknesses
  • No benchmarks comparing tokens-per-second against standard autoregressive baselines.
  • Output quality looks poor in the demo, suggesting it's more proof-of-concept than usable.
Category
Target Audience

ML researchers and hobbyists interested in alternative LLM architectures

Similar To

Mercury · LLaDA · Diffusion-LM

Post Description

Be horse.

Similar Projects