Back to browse
GitHub Repository

A Geometric Attention Transformer with the E8 Root System: Sovereign-Lila-E8 (Lie Lattice Attention Language Model)

10 starsJupyter Notebook

Lila-E8 – 40M Parameter LLM with 0.37 Loss via E8 Lattice Attention

by bootstraptor·Feb 24, 2026·1 point·0 comments

AI Analysis

MidBig BrainWizardry

E8 lattice geometry replaces attention—clever math, but TinyStories 0.37 loss needs context.

Strengths
  • Applies non-Euclidean geometry (240-dimensional E8 roots) to attention systematically
  • Demonstrates longer coherence (1500 tokens) vs standard baselines without semantic collapse
  • Novel mathematical framing, not just parameter tweaking
Weaknesses
  • No comparison to other 40M efficient models (BitNet, MobileLLM) on same benchmarks
  • TinyStories is toy dataset; real-world scaling benefit unknown, claims lack peer review
Category
Target Audience

ML researchers interested in attention mechanisms and geometric deep learning

Similar To

Mixture of Experts approaches · Efficient attention variants (FlashAttention)

Post Description

I’m excited to release Sovereign-Lila-E8, a novel transformer architecture that replaces standard attention mechanisms with a native E8 Root System Lattice. While the industry is brute-forcing intelligence with trillions of parameters, I went "outside" the system to find a zero-viscosity solution.

I built Sovereign-Lila-E8 because I wanted to see if we could bypass the 'viscosity' of standard attention mechanisms using higher-dimensional geometry.

Most small models today are just distilled copies of larger ones. LILA-E8 is different: it implements a native E8 Root System Lattice directly into the attention weights. By using the densest sphere packing in 8 dimensions, we minimize semantic friction (information loss) in the latent space.

The Results:

Efficiency: 40M parameters achieving 0.37 Train / 0.44 Val Loss on the TinyStories dataset (outperforming standard 60M baselines). Stability: Sustained coherence for 1000+ tokens without the common semantic looping seen in small-scale transformers. By implementing the E8 exceptional Lie algebra directly into the attention weights, I’ve achieved a state of "Geometric Resonance" that standard transformers simply cannot reach. At 200,000 steps, the model achieved a state of 'Geometric Resonance'—a phase shift in quality that typically requires 2-3x more parameters in standard architectures. I’ve provided a 1-click Google Colab for instant verification of the weights and generation quality. GitHub: https://github.com/SPUTNIKAI/sovereign-lila-e8 Colab: https://colab.research.google.com/github/SPUTNIKAI/sovereign... Zenodo: (Preprint): https://zenodo.org/records/18731736

Looking for feedback on expanding the context window to 4096 and potentially porting this to the 24D Leech Lattice. (see also https://zenodo.org/records/18729723 )

Similar Projects

AI/ML●●●Banger

MaximusLLM – Train 262k-vocab LLMs on a single 16GB GPU

Ghost Logit math bypasses 262k vocab OOM without materializing full matrices.

Big BrainWizardryZero to One
yousef_g
202mo ago
AI/ML●●●Banger

Glq LLM quantization using E8 lattice

E8 lattice codebooks beat GPTQ at 2-4 bpw with fused CUDA kernel skipping weight materialization.

WizardryBig Brain
acd
202d ago