Back to browse
AptSelect – A local LLM client for parallel testing and evaluation

AptSelect – A local LLM client for parallel testing and evaluation

by dhavalt·Jun 17, 2026·2 points·0 comments

AI Analysis

●●SolidSolve My ProblemNiche Gem

Parallel LLM testing across providers when LangSmith costs way more.

Strengths
  • OS keyring encryption for API keys keeps credentials off your servers entirely.
  • Side-by-side output comparison with exact token usage and latency metrics.
  • Manual diagnostic tags build human-verified performance leaderboards over time.
Weaknesses
  • LLM eval space is crowded with LangSmith, Braintrust, and Arize already established.
  • Electron adds overhead when a native app could be lighter for this use case.
Category
Target Audience

LLM developers and prompt engineers

Similar To

LangSmith · Braintrust · Arize Phoenix

Post Description

I built AptSelect to stop writing throwaway scripts every time I needed to test how different LLMs handle specific instructions and prompt edge cases.

What it does:

Parallel Execution: Send a single prompt to OpenAI, Anthropic, Mistral, and Gemini simultaneously. Compare the outputs, latency, and exact token usage side-by-side.

Batch Evaluations: Upload a CSV dataset to run bulk tests across multiple models at once.

Manual Diagnostics: Grade outputs manually (Pass/Fail) and assign diagnostic tags (e.g., Hallucination, Format Error) to build a human-verified performance leaderboard.

Local-first: API keys encrypted with your OS keyring; history stored in a local SQLite DB; no telemetry.

I’m looking for technical feedback. What do you think current LLM testing/evaluation tools get most wrong?

Similar Projects