Spec-shaker – "Chaos engineering" for tests via semantic mutation

Name: Spec-shaker – "Chaos engineering" for tests via semantic mutation
Availability: InStock
Author: lydiazbaziny

by lydiazbaziny·Mar 3, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainSolve My Problem

LLM-driven semantic mutants (off-by-one bugs, swallowed errors) beat mechanical swap mutation testing.

Strengths

•Semantic mutations (missing side effects, boundary errors) are meaningfully harder to catch than syntactic swaps.
•Multi-stage showcase with URL shortener demonstrates real test gaps and iterative improvement—concrete evidence.
•Directly addresses 'vibe code' risk: AI-generated specs + tests without validation now have an automated checker.

Weaknesses

•CI integration and cost model undefined; running LLM generation per mutant could be expensive at scale.
•Classic mutation testing already exists (mutmut, Stryker); semantic advantage unproven on real large codebases.

Post Description

Hi HN — I built spec-shaker, a tool/skill for testing your tests.

Instead of classic mutation testing that does mechanical swaps (>→>=), spec-shaker uses an LLM to generate semantically broken implementations — realistic bugs like swallowed errors, missing side effects, off-by-one expiration boundaries, etc. It runs your test suite against each mutant and reports which mutants were killed vs survived. Survivors usually point to gaps in spec/assertions.

There’s a small demo (a URL shortener) that shows how survivors guide spec + test improvements across iterations.

I’d love feedback on: * whether “semantic mutants” are useful vs classic mutation testing * how you’d run this in CI (budgets, sampling, scoring)