GitHub Repository

Structural observation for AI agents. Detects loops and predicts failure at step 10 with AUC=0.814. Validated on 80K real sessions. caum.systems

1 stars

CAUM – 80K AI agent sessions analyzed. 88.7% loops fail. AUC=0.814

Name: CAUM – 80K AI agent sessions analyzed. 88.7% loops fail. AUC=0.814
Availability: InStock
Author: Caum

by Caum·Apr 1, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainNiche Gem

Predicts agent failure at step 10 without reading prompts or payloads.

Strengths

•80K real session validation across Llama 8B, 70B, and 405B models.
•Structural observation preserves privacy by not inspecting prompts or data.
•Five metrics like Tool Coherence Ratio provide actionable debugging signals.

Weaknesses

•Research findings without a production-ready monitoring tool to integrate.
•No clear integration path for existing agent frameworks like LangChain.

Similar Projects

AI/ML●●Solid

Skawld – Open-source SDK for company-specific AI agents

Agent SDK with SQLite session persistence when LangChain already dominates.

Ship ItNiche Gem

MikahDang

3214d ago

Developer Tools●●Solid

Tiny agentic loop with Docker sandbox

Thirty-line agent loop with Docker sandboxing contains blast radius safely.

CozyBig Brain

everlier

101mo ago

Developer Tools●●●Banger

Termem, cross-agent memory and session management

Claude can read Codex sessions—cross-agent memory without network calls.

Zero to OneBig BrainNiche Gem

todience

202d ago

Developer Tools●●●Banger

L88-Full – Looking for feedback, bug fixes, and contributors

Self-correcting LangGraph RAG with local LLM, hybrid retrieval, and role-based multi-user workspace.

Zero to OneWizardryShip It

hundredtrillion

113mo ago

Education○Pass

Im making a tutorial Zero to LLM Agent; and it wrote its own agent loop

Blog posts about building LLM agents, not a tool or framework.

macrolet

103mo ago

Developer Tools●●Solid

Looplet – a 0-dep agent loop you own

Iterator-first design beats black-box frameworks like LangChain for debugging.

Big BrainNiche Gem

hsaghir

201mo ago