AthenaFlow – it browses your app, then writes Playwright tests
Self-healing tests via semantic IDs instead of selectors, but needs proof it beats Copilot test gen.
LLM-driven semantic mutants (off-by-one bugs, swallowed errors) beat mechanical swap mutation testing.
Test engineers, QA leads, teams using AI-generated code without rigorous test review.
Stryker · mutmut · PIT (Pitest)
Instead of classic mutation testing that does mechanical swaps (>→>=), spec-shaker uses an LLM to generate semantically broken implementations — realistic bugs like swallowed errors, missing side effects, off-by-one expiration boundaries, etc. It runs your test suite against each mutant and reports which mutants were killed vs survived. Survivors usually point to gaps in spec/assertions.
There’s a small demo (a URL shortener) that shows how survivors guide spec + test improvements across iterations.
I’d love feedback on: * whether “semantic mutants” are useful vs classic mutation testing * how you’d run this in CI (budgets, sampling, scoring)
Self-healing tests via semantic IDs instead of selectors, but needs proof it beats Copilot test gen.
Compresses 28M tokens to 100k queryable chars local-only; duplicates RAG problems at smaller scale.
Deterministic graphs instead of vector embeddings sound clever, but long-context windows and RAG tools already solve this problem cheaper.
5.8M artworks searchable by concept, style, and 5000-year time range.
Yet another versioning tool with no clear differentiation from semantic-release.
Vector search on SQLite alone hits 21ms p.95 — no vector database required.