I applied Markowitz port. theory to agent teams / proved it in a zkVM

Name: I applied Markowitz port. theory to agent teams / proved it in a zkVM
Availability: InStock
Author: alexgarden

by alexgarden·Feb 24, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardryBig BrainZero to One

ZK proofs for agent audits—sidesteps proving LLM correctness by proving auditor honesty instead.

Strengths

•Novel framing: proves auditor integrity, not model correctness—avoids 150s/token LLM proof overhead.
•Interactive demo shows real multi-agent orchestration with cryptographic certificates and alignment scoring.
•Genuine differentiator: agent auditing was either logged (mutable) or unproven; this closes the gap with cryptographic proof.

Weaknesses

•Currently a showcase, not a shipping product—no API docs, pricing, or public adoption signals.
•zkVM overhead and latency in production remain unquantified; 'faster than Lagrange' needs benchmarks.

Post Description

I run multi-agent teams in high-consequence scenarios. Read: fuckups at 3 AM = I'm awake.

I kept hitting the same issue. I couldn't get a rules-based system to enforce behavior and I had no real way to prove that agents really did what they said they did. I can log and monitor them - set up (a million) Slack alerts but none of these things are PROOF. Logs are mutable. And that matters more every day as agents get more powerful (take THAT, @meta)

So I went down the rabbit hole.

The obvious answer is zero-knowledge proofs. Prove behavior cryptographically. Except proving an LLM inference in a zkVM is computationally Star Trek. Lagrange proved GPT-2 E2E, Polyhedra can do Llama-3 at 150 seconds per token — production-scale is still hours, not seconds.

The a-ha: I don't need to prove the model is correct. I need to prove the auditor is honest.

My system intercepts agent thinking blocks (Claude, OpenAI, Gemini), analyzes them against a behavioral contract, and produces a verdict: clear, review needed, or boundary violation. That derivation is deterministic — ~10,000 RISC-V cycles. Provable today.

So I built a guest program inside SP1's zkVM (on Modal) that re-derives the verdict from scratch, ignoring what the auditor claimed, and generates a STARK proof. If the auditor said "clear" but the evidence warranted "violation," the proof fails. The auditor cannot lie.

Quis custodiet ipsos custodes (Who watches the watchmen?) — answered with math. Sub-second on GPU.

OK... pretty cool, but what ELSE can you do with it?

Great question! Once I had provable individual verdicts: what about teams? Can I prove the group is safe?

I ended up applying financial risk theory to AI agent fleets (things I never expected to be doing with my life). CoVaR for tail risk — one bad agent in a group of four good ones doesn't average out to "fine." Markowitz portfolio theory for coherence — treating value alignment like diversification. DebtRank for contagion — if Agent A fails, who's exposed? Originally designed for bank failures. Works disturbingly well for agents.

Then I needed Shapley attribution for individual risk. Except real Shapley is exponential (2^n subsets), and Monte Carlo introduces randomness. Randomness = non-determinism = unprovable in a zkVM. Leave-One-Out approximation: deterministic, O(n²), the only Shapley variant that works inside a prover.

Oh, and all of it runs in Q16.16 fixed-point arithmetic (i32) because floating-point produces different results on different architectures, and "different results" inside a zkVM = worthless proof. I implemented exp, sqrt, and clamp from scratch in integer math. Casting spells at 2 AM in the dark again.

The whole stack — CoVaR, Markowitz, DebtRank, Shapley, circuit breakers — computes in TypeScript on Cloudflare Workers (instant), then re-derives in Rust inside the zkVM (provable). Both produce identical results. If they don't, something is very wrong.

So what?

Every agent accumulates cryptographically attested checkpoints — Ed25519 signatures, SHA-256 hash chains, Merkle trees, STARK proofs — and earns a Trust Score. Credit rating for AI agents, AAA to CCC. The score isn't an opinion. It's a computation over evidence anyone can independently verify. FICO computes scores from data you can't inspect. This computes scores from data anyone can cryptographically verify.

Everything I described here is live code. Four agents handling a production incident — coherence matrix, trust topology, Merkle visualization, drift detection: https://mnemom.ai/showcase

Apache-licensed. Zero-code gateway: npm install -g @mnemom/smoltbot && smoltbot register

GitHub: github.com/mnemom | Docs: docs.mnemom.ai