Back to browse
GitHub Repository

Real-time hallucination detection for LLMs via Geometric Drift Analysis in Hidden States.

15 starsPython

Detect LLM hallucinations via geometric drift (0.9 AUC, 1% overhead)

by yubainu·Feb 24, 2026·1 point·1 comment

AI Analysis

●●SolidBig BrainWizardryShip It

Detects hallucinations via latent space geometry instead of text analysis, but 54% detection rate is incomplete.

Strengths
  • Novel signal: monitors hidden state trajectory instead of semantic output—fundamentally different from RAG/guardrail approaches.
  • Genuinely lightweight: O(d) per-token overhead verified on RTX 3050; no quantization tricks or batch-only measurement.
  • Real-time intervention possible: can stop generation mid-stream before hallucinated token appears, vs post-hoc detection.
Weaknesses
  • 54% detection rate leaves majority of hallucinations undetected; 88% precision means false positives still disrupt output.
  • Evaluation limited to Gemma-2B on 1K samples; generalization to larger models, closed-source APIs, or multi-lingual benchmarks unclear.
Category
Target Audience

LLM researchers, inference engineers, and developers building safety-critical AI systems on consumer hardware.

Similar To

Anthropic's Constitutional AI (safety monitoring) · Hugging Face SafetyChecker · Together AI's GuardRail

Post Description

I built SIB-ENGINE, a real-time hallucination detection system that monitors LLM internal structure rather than output content.

KEY RESULTS (Gemma-2B, N=1000): • 54% hallucination detection with 7% false positive rate • <1% computational overhead (runs on RTX 3050 with 4GB VRAM) • ROC-AUC: 0.8995

WHY IT'S DIFFERENT: Traditional methods analyze the output text semantically. SIB-ENGINE monitors "geometric drift" in hidden states during generation - identifying the structural collapse of the latent space before the first incorrect token is sampled.

This approach offers unique advantages: • Real-time intervention: Stop generation mid-stream • Language-agnostic: No semantic analysis needed • Privacy-preserving: Never reads the actual content • Extremely lightweight: Works on consumer hardware

HOW IT WORKS: SIB-ENGINE monitors the internal stability of the model's computation. While the system utilizes multiple structural signals to detect instability, two primary indicators include:

Representation Stability: Tracking how the initial intent is preserved or distorted as it moves through the model's transformation space.

Cross-Layer Alignment: Monitoring the consensus of information processing across different neural depths to identify early-stage divergence.

When these (and other proprietary structural signals) deviate from the expected stable manifold, the system flags a potential hallucination before it manifests in the output.

DEMO & CODE: • Demo video: https://www.youtube.com/watch?v=H1_zDC0SXQ8 • GitHub: https://github.com/yubainu/sibainu-engine • Raw data: raw_logs.csv (full transparency)

LIMITATIONS: • Tested on Gemma-2B only (2.5B parameters) • Designed to scale, but needs validation on larger models • Catches "structurally unstable" hallucinations (about half) • Best used as first-line defense in ensemble systems

TECHNICAL NOTES: • No external models needed (unlike self-consistency methods) • No knowledge bases required (unlike RAG approaches) • Adds ~1% inference time vs. 300-500% for semantic methods • Works by monitoring the process not the product

I'd love feedback on: • Validation on larger models (Seeking strategic partnerships and compute resources for large-scale validation.) • Integration patterns for production systems • Comparison with other structural approaches • Edge cases where geometric signals fail

This represents a fundamentally different paradigm: instead of asking "is this text correct?", we ask "was the generation process unstable?" The answer is surprisingly informative.

Happy to discuss technical details in the comments!

Similar Projects

AI/ML●●Solid

UQLM – Closed-book hallucination detection with UQ

Peer-reviewed LLM hallucination detector using uncertainty quantification, published in JMLR and TMLR.

Niche GemSolve My Problem
virenbajaj
313d ago