I built a playground of interative A/B testing for RAG

Name: I built a playground of interative A/B testing for RAG
Availability: InStock
Author: Hanhan2024

by Hanhan2024·May 8, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●MidShip It

Yet another RAG eval UI in a field crowded with Arize, LangSmith, and Ragas.

Strengths

•Targets the specific friction of non-technical experts reviewing flagged entries.
•Zero-setup web playground allows immediate testing without local installation.

Weaknesses

•Only includes one tiny dataset (30 records), limiting real-world utility.
•No clear technical differentiation from established observability platforms.

Post Description

To iteratively improve RAG performance, current evaluation solutions still take lots of manually work or lots of coding. And it requires close collaboration between AI engineers and domain experts (who may not know how to code).

So I built this playground to show a smoothier workflow that enables continuously improvement of RAG, it can 1) run RAGs and get evaluation results quickly 2) generate insight both tech & non-tech can understand 3) provide an UI for domain experts to review and update flagged entries easier.