Signed receipts for agent actions
Ed25519 signed receipts solve AI agent accountability across org boundaries.
Tamper-proof execution sandbox for trustworthy AI coding-agent benchmarks
Signed isolation bundles stop agents from reading test files or curling solutions.
AI researchers, benchmark creators, ML engineers
SWE-bench · MLE-bench · Docker
Ed25519 signed receipts solve AI agent accountability across org boundaries.
Unsupervised bug benchmark using agents as both attackers and defenders—novel scoring methodology.
Expands corpus to 16 CVE-anchored scenarios to break model ties.
Team-wide memory pool for agents when most tools stay siloed on one workstation.
VM isolation for coding agents beats container-based sandboxing for true environment separation.
First benchmark testing structured requirements on complex greenfield agent tasks.