Back to browse
GitHub Repository

Pre-Execution Gate for AI Code. A deterministic, gradient-immune structural guard against reward hacking and hardcoding in RL training loops.

1 starsPython

AST-guard A gradient-immune structural guard against RL reward hacking

by thinking-nick·Jun 29, 2026·3 points·0 comments

AI Analysis

●●SolidBig BrainNiche Gem

Gradient-immune AST analysis that RL models can't optimize against through backpropagation.

Strengths
  • Deterministic structural analysis means zero false positives on known hack patterns.
  • Empirically validated in actual RL training loops, not just theoretical.
  • Sub-10ms latency makes it viable as a real pre-execution gate.
Weaknesses
  • Explicitly experimental research artifact, not production-ready.
  • Only catches structural hacks—semantic bypasses require escalation to other tools.
Category
Target Audience

AI safety researchers, RL engineers training code-generation models

Similar To

TRACE · RewardHackWatch · EvilGenie

Similar Projects

AI/ML●●Solid

RewardGuard – detect reward hacking in RL training loops

Catches reward hacking before it tanks your RL training run.

Niche GemBig Brain
Giovan321
112mo ago
EducationMid

rlvrbook

Educational content in a space where Nathan Lambert's RLHF book already exists.

Niche Gem
kyars
112mo ago