Back to browse
GitHub Repository

Hardening pipelines to protect LLMs from untrusted content

18 starsPython

GuardLLM, hardened tool calls for LLM apps

by mhcoen·Feb 14, 2026·1 point·0 comments

AI Analysis

●●●BangerBig BrainSolve My ProblemWizardry

Lifecycle-aware security pipeline, not point tools—shared context from ingress through output.

Strengths
  • Architectural insight: treats security as a full data lifecycle problem, not isolated checks, enabling context-aware decisions downstream.
  • Concrete controls with measurable performance: 0.1ms processing, 100% coverage vs 61% from point tools, runs fully local without external APIs.
  • Addresses a real LLM security gap: existing defenses either model-dependent (slow) or fragmented (OPA, Casbin, etc. don't share context).
Weaknesses
  • Early stage with no releases published; adoption will depend on integration with major agent frameworks (LangChain, LlamaIndex, etc.).
  • Python-only limits applicability in polyglot AI stacks; NodeJS/Go equivalents would unlock broader use.
Category
Target Audience

LLM/AI application developers, agent framework maintainers

Similar To

OPA (Open Policy Agent) · LangSmith (monitoring/debugging) · Guardrails AI (validation layer)

Post Description

Most agent frameworks treat prompt injection as a model-level problem. In practice, once your agent ingests untrusted text and has tool access, you need application-layer controls — structural isolation, tool-call gating, exfiltration detection — that don't depend on the model behaving correctly. I built guardllm to provide those controls. guardllm is a small, auditable Python library that provides:

Inbound hardening: sanitize and structurally isolate untrusted content (web, email, docs, tool output) so it is treated as data, not instructions. Tool-call firewall: deny-by-default destructive operations unless explicitly authorized; fail-closed confirmation when no confirmation handler is wired. Request binding: bind (tool name, canonical args, message hash, TTL) to prevent replay and argument substitution. Exfiltration detection: scans outbound tool arguments for secret patterns and flags substantial verbatim overlap with recently ingested untrusted content. Provenance tracking: enforces stricter no-copy rules on content with known untrusted origin, independent of the overlap heuristic. Canary tokens: per-session canary generation and detection to catch prompt leakage into outputs. Source gating: blocks high-risk sources from being promoted into long-lived memory or KG extraction to reduce memory poisoning.

It is intentionally minimal and not framework-specific. It does not replace least-privilege credentials or sandboxing — it sits above them. Repo: https://github.com/mhcoen/guardllm I'd like feedback on: what threat model gaps you see; whether the default overlap thresholds are reasonable for summarization and quoting workflows; and which framework adapters would make this easiest to adopt (LangChain, OpenAI tool calling, MCP proxy, etc.).

Similar Projects

Security●●Solid

SecureClaw – Open-Source Security Layer for OpenClaw Agents

The two-layer approach — a code plugin for gates/hardening plus a tiny ~1,230-token LLM skill for behavioral rules — is smart and practical. I appreciate that detection runs in bash (no token bloat) and that they mapped concrete checks to OWASP ASI and MITRE frameworks; the tradeoff is obvious: this is highly valuable if you run OpenClaw, but mostly irrelevant outside that ecosystem.

Niche GemBig Brain
alex_polyakov
213mo ago
Security●●Solid

AgentArmor – open-source 8-layer security framework for AI agents

Eight-layer defense-in-depth for AI agents when Guardrails AI only handles inputs.

Solve My ProblemShip It
AgastyaTodi
1063mo ago
Security●●●Banger

MVAR – Deterministic sink enforcement for AI agent

IFC + capabilities block prompt injection at execution sinks, not input filters—40yr research applied.

Big BrainWizardry
ShawnC21
113mo ago