Mutatr – an open source A/B testing agent
Synthetic personas simulate attention heatmaps before you ship to real users.
An agent loop that builds and tests web prototypes
Persona-driven critique loop is clever, but locked to pi.dev limits adoption.
Product developers using AI coding agents, UX researchers
UserTesting · Maze · Hotjar
I've been thinking about how product development will change with AI. The earliest stages of product development have so much ambiguity. Because code was costly and expensive, we spent a lot of time writing specs and doing user research.
I thought I'd try an experiment after (a) seeing advancements around evaluation systems especially for UX (b) realizing that AI can create a reasonably good enrichment of a persona/end-user (c) seeing karpathy's autoresearch project.
Basically you provide high-level app idea, and a definition of the target user. Autocrit will create a persona definition, has that persona create evaluation tasks, and then starts a loop where a coding agent builds a prototype, and a persona agent will try to use it in a real browser. They will judge it based on the tasks, giving scores and verbatim feedback. The coding agent creates a plan to fix things, keeps improvements, reverts bad ideas, etc. The loop runs overnight.
The goal is to get a better understanding of where to take the product at an early stage e.g. paper prototypes, before actually starting to build the product. The evaluation loop of prototyping and getting feedback is automated here, but humans provide the definition of the persona, app idea / product goals, and hypotheses that need validation.
Synthetic personas simulate attention heatmaps before you ship to real users.
65k synthetic personas replace human polling for UK election forecasts.
Loop driver + 15 slash commands for Claude Code, but orchestration over integration.
Automated rollback on regression is a killer feature LangSmith doesn't have.
MCP proxy prevents context-window bloat better than naive LLM forecasting.
Fascinating art experiment, but more novelty than tool developers would actually use.