GitHub Repository

A Geometric Attention Transformer with the E8 Root System: Sovereign-Lila-E8 (Lie Lattice Attention Language Model)

10 starsJupyter Notebook

Lila-E8 – 40M Parameter LLM with 0.37 Loss via E8 Lattice Attention

Name: Lila-E8 – 40M Parameter LLM with 0.37 Loss via E8 Lattice Attention
Availability: InStock
Author: bootstraptor

by bootstraptor·Feb 24, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●MidBig BrainWizardry

E8 lattice geometry replaces attention—clever math, but TinyStories 0.37 loss needs context.

Strengths

•Applies non-Euclidean geometry (240-dimensional E8 roots) to attention systematically
•Demonstrates longer coherence (1500 tokens) vs standard baselines without semantic collapse
•Novel mathematical framing, not just parameter tweaking

Weaknesses

•No comparison to other 40M efficient models (BitNet, MobileLLM) on same benchmarks
•TinyStories is toy dataset; real-world scaling benefit unknown, claims lack peer review

Post Description

I’m excited to release Sovereign-Lila-E8, a novel transformer architecture that replaces standard attention mechanisms with a native E8 Root System Lattice. While the industry is brute-forcing intelligence with trillions of parameters, I went "outside" the system to find a zero-viscosity solution.

I built Sovereign-Lila-E8 because I wanted to see if we could bypass the 'viscosity' of standard attention mechanisms using higher-dimensional geometry.

Most small models today are just distilled copies of larger ones. LILA-E8 is different: it implements a native E8 Root System Lattice directly into the attention weights. By using the densest sphere packing in 8 dimensions, we minimize semantic friction (information loss) in the latent space.

The Results:

Efficiency: 40M parameters achieving 0.37 Train / 0.44 Val Loss on the TinyStories dataset (outperforming standard 60M baselines). Stability: Sustained coherence for 1000+ tokens without the common semantic looping seen in small-scale transformers. By implementing the E8 exceptional Lie algebra directly into the attention weights, I’ve achieved a state of "Geometric Resonance" that standard transformers simply cannot reach. At 200,000 steps, the model achieved a state of 'Geometric Resonance'—a phase shift in quality that typically requires 2-3x more parameters in standard architectures. I’ve provided a 1-click Google Colab for instant verification of the weights and generation quality. GitHub: https://github.com/SPUTNIKAI/sovereign-lila-e8 Colab: https://colab.research.google.com/github/SPUTNIKAI/sovereign... Zenodo: (Preprint): https://zenodo.org/records/18731736

Looking for feedback on expanding the context window to 4096 and potentially porting this to the 24D Leech Lattice. (see also https://zenodo.org/records/18729723 )