Back to browse
GitHub Repository

Tamper-proof execution sandbox for trustworthy AI coding-agent benchmarks

0 starsRust

Proctor – signed isolation bundles for AI coding-agent benchmarks

by dp12·Jun 23, 2026·3 points·0 comments

AI Analysis

●●●BangerBig BrainSolve My Problem

Signed isolation bundles stop agents from reading test files or curling solutions.

Strengths
  • Linux namespaces block filesystem reads and network egress attempts by construction.
  • Signed verdict bundles make benchmark results tamper-evident and independently verifiable.
  • Logs specific violation attempts like masked-file reads for easier debugging.
Weaknesses
  • Cannot prevent cheating via injected answer keys in the agent scaffold.
  • Niche utility limited to teams building or validating their own AI benchmarks.
Category
Target Audience

AI researchers, benchmark creators, ML engineers

Similar To

SWE-bench · MLE-bench · Docker

Similar Projects

AI/ML●●●Banger

Signed receipts for agent actions

Ed25519 signed receipts solve AI agent accountability across org boundaries.

Zero to OneBig Brain
jithinraj
203mo ago
AI/ML●●Solid

Agentic Intent Benchmark

First benchmark testing structured requirements on complex greenfield agent tasks.

Niche GemBig Brain
ryan4rtmx
2027d ago