GitHub Repository

Open-source permission control plane for AI agents. Scan, enforce, and audit every tool call.

18 starsPython

AgentWard – After an AI agent deleted files, I built a runtime enforcer

Name: AgentWard – After an AI agent deleted files, I built a runtime enforcer
Availability: InStock
Author: ratnaditya

by ratnaditya·Feb 23, 2026·1 point·1 comment

Visit Project View on HN

AI Analysis

●●●BangerZero to OneSolve My ProblemBig Brain

First runtime permission layer for agents—detects risky tool chains and enforces policies outside LLM context.

Strengths

•Solves genuine gap: prompt rules are bypassable; this enforces at the proxy layer where injection can't reach
•Detects dangerous skill *combinations* (email+browser→exfil), not just individual tools—shows real security thinking
•One-command init with auto-scanning + sensible defaults lowers friction; audit logging satisfies compliance needs

Weaknesses

•Currently scoped to OpenClaw/Cursor/Claude Desktop—ecosystem lock-in limits appeal to broader agent frameworks
•No public incident report or security audit that proves the enforcement model is airtight

Post Description

I've spent time working on AI safety and kept running into the same problem: AI agents have far more access than they need, and the only thing stopping them from misusing it is a prompt. Prompts can be ignored. They can be overridden by prompt injection. They're not enforcement — they're a suggestion. AgentWard is a proxy layer that sits between your agent and its tools and enforces permissions in code, outside the LLM context window. No matter what the model decides, the policy is what actually runs. What it does:

Scans your OpenClaw skills and flags risky permissions Detects dangerous skill combinations — pairs that are low-risk individually but become high-risk when chained together (email + web browser → data exfiltration path) Enforces a YAML policy at runtime — ALLOW, BLOCK, APPROVE, REDACT Logs everything for audit

Getting started is one command: agentward init It scans, shows your risk profile, and wraps your environment with a sensible default policy in under two minutes. Honest caveats: Currently tested on OpenClaw skills and Mac only. MCP server support and Windows are on the roadmap — contributions welcome. This is early and rough in places, but the core enforcement works. I'm sharing it now because the problem is real and getting worse fast. Would love feedback from anyone running agents in production. GitHub: github.com/agentward-ai/agentward