Back to browse
GitHub Repository

Benchmark harness measuring AI coding tool+workflow performance, not just model capability. 100 tasks, sigmoid scoring, 12 capability dimensions, gap analysis.

10 starsPython

AWB – Benchmark that tests your AI coding workflow, not just the model

by xmpuspus·Mar 22, 2026·1 point·0 comments

AI Analysis

●●●BangerBig BrainZero to One

Tests workflow + tool + model together, not just model capability like SWE-bench.

Strengths
  • 80 tasks from real open-source repos with pinned commits
  • 7 scoring dimensions including security, cost, and reliability
  • Sigmoid normalization prevents score collapse at boundaries
Weaknesses
  • Zero stars means no community validation yet
  • Workflow benchmarking category will attract competitors quickly
Category
Target Audience

Engineering teams evaluating AI coding tools and workflows

Similar To

SWE-bench · Aider benchmarks · Claude Code eval tools

Similar Projects

AI/ML●●Solid

Agentic Intent Benchmark

First benchmark testing structured requirements on complex greenfield agent tasks.

Niche GemBig Brain
ryan4rtmx
2019d ago