Back to browse
GitHub Repository

🛡️ Safe AI Agents through Action Classifier

10 starsPython

Agent Action Guard – AI agent action safety

by praneeth-v·Apr 1, 2026·2 points·0 comments

AI Analysis

●●SolidSolve My ProblemNiche Gem

HarmActionsEval benchmark proves GPT and Claude fail at blocking harmful tool use.

Strengths
  • HarmActionsEval benchmark provides concrete failure metrics for existing models.
  • PyPI package means drop-in integration without architectural changes.
  • Action classifier approach intercepts before execution, not after damage.
Weaknesses
  • AI agent guardrails space is crowded with Guardrails AI, Lakera, and others.
  • Only 7 stars suggests limited real-world testing or adoption so far.
Category
Target Audience

AI agent developers, ML engineers building autonomous systems

Similar To

Guardrails AI · Lakera Guard · Rebuff

Post Description

Your agents can perform harmful actions without barriers. You do not know that yet. HarmActionBench experiments allowed AI agents to use tools based on harmful instructions, and the results are shocking. Even latest popular AI models, including GPT and Claude, scored very low. They have no barriers in performing harmful actions.

HarmActionsEval proves AI is not yet reliable enough for critical projects. Agent Action Guard blocks harmful actions. GitHub: https://github.com/Pro-GenAI/Agent-Action-Guard

I would love to discuss about possible use cases in your projects, and future directions. It helps to expand the dataset, model, and benchmark. Please discuss at https://github.com/Pro-GenAI/Agent-Action-Guard/discussions/....

Similar Projects