GitHub Repository

First formal security scanner for AI agent skills & plugins. Static analysis, supply chain verification, SBOM generation. 22 frameworks supported including MCP, LangChain, CrewAI.

24 starsPython

SkillFortify, a formal verification for AI agent skills

Name: SkillFortify, a formal verification for AI agent skills
Availability: InStock
Author: varunpratap369

by varunpratap369·Feb 26, 2026·2 points·2 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainNiche Gem

Formal verification for agent skills when heuristic scanners always fail.

Strengths

•Mathematical soundness guarantees via five proven theorems, not heuristic guessing
•Zero false positives on benchmark suite, fail-safe architecture blocks by default
•Multi-format skill detection (CLI, discovery, lock-file reproducibility)

Weaknesses

•Narrow TAM: only matters if you deploy untrusted third-party skills to agents
•Early stage with minimal adoption signals, no public benchmarks against real malicious skills

Post Description

Hi HN,

In January 2026, 1,200 malicious skills infiltrated the OpenClaw agent marketplace (ClawHavoc campaign). A month later, researchers catalogued 6,487 malicious agent tools that VirusTotal cannot detect. The first agent-software RCE was assigned CVE-2026-25253.

The response: a dozen heuristic scanning tools (pattern matching, LLM-as-judge, YARA rules). They all carry the same caveat: "no findings does not mean no risk."

SkillFortify takes a different approach. Instead of checking for known bad patterns, it formally verifies what a skill CAN do against what it CLAIMS to do. Five mathematical theorems guarantee soundness -- if SkillFortify says a skill is safe, it provably cannot exceed its declared capabilities.

What it does: - skillfortify scan . -- discover and analyze all skills in a project - skillfortify verify skill.md -- formally verify against capability declaration - skillfortify lock -- generate skill-lock.json for reproducible configs - skillfortify trust skill.md -- compute trust score (provenance + behavior) - skillfortify sbom -- CycloneDX 1.6 Agent Skill Bill of Materials

Supports Claude Code skills, MCP servers, and OpenClaw manifests.

Evaluated on 540 skills (270 malicious, 270 benign): F1=96.95%, zero false positives.

Paper: [ZENODO_DOI_URL] Install: pip install skillfortify Code: https://github.com/varun369/skillfortify

Built as part of the AgentAssert research suite. Happy to answer questions about the formal model, threat landscape, or benchmark methodology.

Similar Projects

Security●●●Banger

SkillFortify, Formal verification for AI agents (auto-discovers)

Formal verification guarantees for agent skills replace heuristic scanning's 'no findings ≠ no risk' caveat.

Big BrainZero to OneSolve My Problem

varunpratap369

213mo ago

Developer Tools●●Solid

TLA+ Workbench skill for coding agents (compat. with Vercel skills CLI)

TLA+ code generation for agents, but audience is tiny—only useful if your agent needs formal verification.

Niche GemBig Brain

youio

4143mo ago

Developer Tools●●Solid

Lemmafit: A Verifier in the AI Loop

Dafny + Claude Code creates provably correct React logic, but limited to greenfield projects.

Big BrainBold Bet

namin

753mo ago

Developer Tools●●●Banger

Formal – Formal verification for AI-generated code using Lean 4

Lean 4 proofs for AI code correctness—way more rigorous than unit tests.

Big BrainWizardryZero to One

yamafaktory

442mo ago

Developer Tools●●●Banger

Verity, Formally verified smart contracts from spec to bytecode

Formally verified EVM bytecode with zero sorries—actually ships working proofs.

WizardryBig BrainZero to One

th0rgal2

103mo ago

Security●●Solid

IC-AGI – Threshold auth for AI agents, formally verified in TLA+

Impressively concrete safety architecture: K-of-N threshold approval via Shamir SSS, capability tokens with TTL/scope/consumable budgets, an append-only audit ledger and shard-isolated workers all backed by TLA+ proofs for many properties. It reads like a research-to-prototype push — there's real formal rigor and test counts shown — but the repo looks early-stage and would benefit from runnable demos, deployment examples, and clearer integration docs before I'd recommend it for production.

Big BrainBold BetNiche Gem

saezbaldo

223mo ago