STAR prompting fixes Car Wash Problem on Sonnet 4.5 (0%->85%)
They ran a variable-isolation study across five prompt layers with 20 runs per condition and shipped experiment.py and results so you can reproduce which layer actually supplies the missing implicit fact. It’s a focused, practical read for anyone designing layered system prompts, but it feels niche and would be more persuasive with cross-model baselines and clearer statistical reporting.
