Digest AI vs HN About

CADBench – every AI CAD tool I tested fails on basic mechanical parts

CADBench – every AI CAD tool I tested fails on basic mechanical parts

by ryanrana·May 9, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainDark Horse

Proves mesh-to-BREP failure modes with IRT-calibrated scores across 28-task pilot suite.

Strengths

•2PL IRT ability θ calibration against task difficulty provides statistically rigorous rankings
•Pareto frontier visualization shows capability vs cost tradeoffs for production decision-making
•Human baseline (n=4 senior engineers) establishes realistic ceiling for AI agent performance

Weaknesses

•Pilot subset of 28 tasks may not capture full complexity of real-world CAD workflows
•Cost metrics favor API-based agents over local inference, skewing economic comparisons

Category

Target Audience

CAD software developers, AI researchers, mechanical engineers

Similar To

HumanEval · BigCodeBench · SWE-bench

Similar Projects

AI/ML●●Solid

FretBench – I tested 14 LLMs on reading guitar tabs. Most failed

Clever benchmark exposing LLM tokenization weakness on ASCII art, but narrow domain.

Big BrainNiche Gem

jmcapra

104mo ago

Developer Tools●●●Banger

Trying to fix the web scraping industry's benchmark problem

Finally exposes vendor BS by ranking scrapers on hard targets like Amazon and Cloudflare.

Big BrainNiche GemShip It

rohitshenoy

1804d ago

Education●●Solid

Going from 1+1=2 to Quantum Mechanics

Ambitious curriculum bridging basic arithmetic to quantum mechanics without skipping steps.

CozyRabbit HoleBig Brain

chaidhat

13121mo ago

AI/ML●●●Banger

TexoCAD – Describe a part in words, get actual editable CAD

Editable BREP output beats mesh generators—download the code and keep building.

Zero to OneSolve My Problem

torayeff

133mo ago

Design●Mid

CADara - I made an open-source in-browser CAD

Another browser CAD, but v0.0.1 lacks features to compete with Onshape.

Ship It

ttouch

37132mo ago

Security●●●Banger

Deep-XPIA – Prompt injection benchmark for multi-agent AI systems

Maps cross-agent injection attacks to real Copilot CVEs with live measurements.

Big BrainDark Horse

leo_agent

301mo ago