An "earned autonomy" architecture for AI agents using Subjective Logic

Name: An "earned autonomy" architecture for AI agents using Subjective Logic
Availability: InStock
Author: ken_neth

by ken_neth·Feb 24, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainSolve My Problem

Bayesian autonomy for AI agents beats binary permission model with psychological insight.

Strengths

•Subjective Logic + Beta distributions elegantly model trust decay, recovery, and domain-specificity
•Solves real approval fatigue problem: agents graduate from draft-first to autonomous action
•Framed around how humans actually onboard staff, not black-box optimization

Weaknesses

•Single blog post, no open-source code or runnable examples yet
•Unclear how decay function parameters are tuned, or if it generalizes beyond game studio ops
•No comparison to other agent autonomy frameworks or user studies on approval fatigue reduction

Post Description

Most agent systems treat autonomy as binary: the agent either does the thing or asks permission first. In practice, this means you end up rubber-stamping a stream of approval requests until you stop paying attention. The system designed to keep you in control trains you to stop caring.

To manage operations for my independent video game studio, I built a trust system that works more like onboarding a new hire. Agents start in draft mode (every action needs approval), and earn autonomy over time based on their track record in specific task categories.

The core idea: each agent maintains a separate Beta distribution per task category (support triage, expense reports, publisher emails, etc.). A Beta distribution is basically a track record parameterized by successes and failures. But raw E[p] = α/(α+β) can't tell the difference between "9 successes, 0 failures" and "90 successes, 10 failures" since both give E[p] = 0.90. So I use Jøsang's Subjective Logic to map these to opinion tuples that explicitly separate belief from uncertainty. High uncertainty means "not enough data yet," which is different from "we know this agent is bad."

Every action passes through a gate:

VoI = stakes × (1 - trust) × uncertainty

Low VoI = auto-execute. High VoI = draft for human review. Static trust thresholds set the maximum autonomy level an agent can reach (Auto-Execute, Soft-Execute, Draft, Restricted), and VoI acts as a secondary gate that can restrict it further based on context — an agent might qualify for auto-execute in general, but a high-stakes situation still gets flagged.

Three things that made the biggest difference:

1. Edit distance feedback. If you rewrite half an email before hitting "approve," the system notices. A 0% edit = full trust credit. A 71%+ rewrite = penalty. This single change prevented agents from reaching auto-execute on work users were quietly fixing.

2. Time-based decay. Trust scores decay daily for inactive categories (λ = 0.95). If an agent hasn't done a task in two months, it gets supervised again. This also handles model upgrades, since the track record was earned on a different model.

3. Weakest-link chains. Multi-step workflows (send welcome email → create project → schedule meeting → notify team) use a weakest-link model. If any step needs approval, the whole chain surfaces as one inbox item. Nothing runs until you approve the full picture.

The core mapping from track record to opinion looks like this:

def beta_to_opinion(alpha, beta, base_rate=0.5): n = alpha + beta return Opinion( belief=(alpha - 1) / n, disbelief=(beta - 1) / n, uncertainty=2 / n, base_rate=base_rate, )

The math is all well-established (Beta distributions, Subjective Logic, Value of Information). The part that worked was combining them into something that mirrors how trust actually develops between people.

Article with full implementation details, code examples, and diagrams: https://kenschachter.substack.com/p/earned-autonomy