Castor – a secure execution layer for LLM agents
Kernel interception stops runaway agents where LangGraph and AutoGen only advise.
Hardening pipelines to protect LLMs from untrusted content
Lifecycle-aware security pipeline, not point tools—shared context from ingress through output.
LLM/AI application developers, agent framework maintainers
OPA (Open Policy Agent) · LangSmith (monitoring/debugging) · Guardrails AI (validation layer)
Inbound hardening: sanitize and structurally isolate untrusted content (web, email, docs, tool output) so it is treated as data, not instructions. Tool-call firewall: deny-by-default destructive operations unless explicitly authorized; fail-closed confirmation when no confirmation handler is wired. Request binding: bind (tool name, canonical args, message hash, TTL) to prevent replay and argument substitution. Exfiltration detection: scans outbound tool arguments for secret patterns and flags substantial verbatim overlap with recently ingested untrusted content. Provenance tracking: enforces stricter no-copy rules on content with known untrusted origin, independent of the overlap heuristic. Canary tokens: per-session canary generation and detection to catch prompt leakage into outputs. Source gating: blocks high-risk sources from being promoted into long-lived memory or KG extraction to reduce memory poisoning.
It is intentionally minimal and not framework-specific. It does not replace least-privilege credentials or sandboxing — it sits above them. Repo: https://github.com/mhcoen/guardllm I'd like feedback on: what threat model gaps you see; whether the default overlap thresholds are reasonable for summarization and quoting workflows; and which framework adapters would make this easiest to adopt (LangChain, OpenAI tool calling, MCP proxy, etc.).
Kernel interception stops runaway agents where LangGraph and AutoGen only advise.
Hardened Rust alternative to OpenClaw, but early (v0.1 preview, still rough edges).
The two-layer approach — a code plugin for gates/hardening plus a tiny ~1,230-token LLM skill for behavioral rules — is smart and practical. I appreciate that detection runs in bash (no token bloat) and that they mapped concrete checks to OWASP ASI and MITRE frameworks; the tradeoff is obvious: this is highly valuable if you run OpenClaw, but mostly irrelevant outside that ecosystem.
Eight-layer defense-in-depth for AI agents when Guardrails AI only handles inputs.
IFC + capabilities block prompt injection at execution sinks, not input filters—40yr research applied.
Spec compiler approach is interesting but GitHub Spec Kit and Kiro already cover this.