Back to browse
Morph Reflexes – Multi-head classifiers for agent traces

Morph Reflexes – Multi-head classifiers for agent traces

by bhaktatejas922·Jun 30, 2026·11 points·2 comments

AI Analysis

●●●BangerBig BrainWizardry

99% prefill compute reuse enables sub-30ms agent behavior classification.

Strengths
  • Adapts 2019-era BERT/HYDRA multi-head techniques to modern transformer architecture
  • Under 2ms overhead whether running 4 or 100 reflexes simultaneously
  • Solves the real scaling problem of judging millions of agent turns
Weaknesses
  • Landing page requires sign-in, limiting public verification of claims
  • Unclear how this compares to simpler rule-based failure detection
Category
Target Audience

Teams running production AI agents at scale

Similar To

LangSmith · Arize Phoenix · Helicone

Post Description

The most common failures for production agents are behavioral: looping, reasoning leakage, user frustration, and more. Using a frontier model like GPT or Sonnet to judge every turn is too expensive and slow to run at scale.

How it works:

We use a modern LLM with hybrid attention and remove the decode step. We built an inference engine that lets prefill compute be 99% reused from reflex to reflex, similar in spirit to older 2019-era BERT/HYDRA + older multiple-head techniques.

We took the same high-level idea and did the hard work to make it work with a modern architecture and attention. On it, we can run inference in under 30ms and serve the full request in under 90ms. If you run 4 reflexes or 100, the extra overhead is less than 2ms.

Why does optimizing this matter?

If you’re even a medium-sized startup, you’re dealing with tens of thousands of agent runs and millions of turns. If you want to track things like user frustration rates over time, frontier LLM-as-judge does not scale.

I built a similar stack at Tesla. When ML engineers needed to sample data across petabytes for signals like `is_camera_obfuscated=true`, along with 200 other things, you need to 1) spin them up quickly 2) run at scale efficiently

What it is not:

A dashboard. In my experience, 99% of dashboards go unused. This is purely API-based and made for devs who want to track agent behavior themselves and trigger their own alerts and build on it.

You can vibetrain a custom reflex in our dashboard, and then let it self improve in production: https://www.morphllm.com/dashboard/reflex

Docs: https://docs.morphllm.com/sdk/components/reflexes/index

I’d love feedback from people running agents in prod: what sorts of things do you wish you could track over time across 100% of turns?

TLDR: semantic signals from agent traces, super fast, cheap via API

Similar Projects

AI/ML●●●●Gem

I applied Lyapunov stability theory to detect when LLM agents spiral

Lyapunov stability theory catches token spirals before your budget explodes.

Big BrainZero to OneSolve My Problem
visha1v
11220d ago
AI/ML●●●Banger

I applied Lyapunov stability theory to detect when LLM agents spiral

Lyapunov stability theory applied to LLM agents — classifies failures with zero extra API calls.

Big BrainWizardry
visha1v
309d ago