Back to browse
I built a playground of interative A/B testing for RAG

I built a playground of interative A/B testing for RAG

by Hanhan2024·May 8, 2026·2 points·0 comments

AI Analysis

MidShip It

Yet another RAG eval UI in a field crowded with Arize, LangSmith, and Ragas.

Strengths
  • Targets the specific friction of non-technical experts reviewing flagged entries.
  • Zero-setup web playground allows immediate testing without local installation.
Weaknesses
  • Only includes one tiny dataset (30 records), limiting real-world utility.
  • No clear technical differentiation from established observability platforms.
Category
Target Audience

AI engineers and domain experts collaborating on RAG systems

Similar To

LangSmith · Arize Phoenix · Ragas

Post Description

To iteratively improve RAG performance, current evaluation solutions still take lots of manually work or lots of coding. And it requires close collaboration between AI engineers and domain experts (who may not know how to code).

So I built this playground to show a smoothier workflow that enables continuously improvement of RAG, it can 1) run RAGs and get evaluation results quickly 2) generate insight both tech & non-tech can understand 3) provide an UI for domain experts to review and update flagged entries easier.

Similar Projects