Back to browse
GitHub Repository
0 starsPython

Spec-shaker – "Chaos engineering" for tests via semantic mutation

by lydiazbaziny·Mar 3, 2026·1 point·0 comments

AI Analysis

●●●BangerBig BrainSolve My Problem

LLM-driven semantic mutants (off-by-one bugs, swallowed errors) beat mechanical swap mutation testing.

Strengths
  • Semantic mutations (missing side effects, boundary errors) are meaningfully harder to catch than syntactic swaps.
  • Multi-stage showcase with URL shortener demonstrates real test gaps and iterative improvement—concrete evidence.
  • Directly addresses 'vibe code' risk: AI-generated specs + tests without validation now have an automated checker.
Weaknesses
  • CI integration and cost model undefined; running LLM generation per mutant could be expensive at scale.
  • Classic mutation testing already exists (mutmut, Stryker); semantic advantage unproven on real large codebases.
Target Audience

Test engineers, QA leads, teams using AI-generated code without rigorous test review.

Similar To

Stryker · mutmut · PIT (Pitest)

Post Description

Hi HN — I built spec-shaker, a tool/skill for testing your tests.

Instead of classic mutation testing that does mechanical swaps (>→>=), spec-shaker uses an LLM to generate semantically broken implementations — realistic bugs like swallowed errors, missing side effects, off-by-one expiration boundaries, etc. It runs your test suite against each mutant and reports which mutants were killed vs survived. Survivors usually point to gaps in spec/assertions.

There’s a small demo (a URL shortener) that shows how survivors guide spec + test improvements across iterations.

I’d love feedback on: * whether “semantic mutants” are useful vs classic mutation testing * how you’d run this in CI (budgets, sampling, scoring)

Similar Projects