AptSelect – A local LLM client for parallel testing and evaluation

Name: AptSelect – A local LLM client for parallel testing and evaluation
Availability: InStock
Author: dhavalt

by dhavalt·Jun 17, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemNiche Gem

Parallel LLM testing across providers when LangSmith costs way more.

Strengths

•OS keyring encryption for API keys keeps credentials off your servers entirely.
•Side-by-side output comparison with exact token usage and latency metrics.
•Manual diagnostic tags build human-verified performance leaderboards over time.

Weaknesses

•LLM eval space is crowded with LangSmith, Braintrust, and Arize already established.
•Electron adds overhead when a native app could be lighter for this use case.

Post Description

I built AptSelect to stop writing throwaway scripts every time I needed to test how different LLMs handle specific instructions and prompt edge cases.

What it does:

Parallel Execution: Send a single prompt to OpenAI, Anthropic, Mistral, and Gemini simultaneously. Compare the outputs, latency, and exact token usage side-by-side.

Batch Evaluations: Upload a CSV dataset to run bulk tests across multiple models at once.

Manual Diagnostics: Grade outputs manually (Pass/Fail) and assign diagnostic tags (e.g., Hallucination, Format Error) to build a human-verified performance leaderboard.

Local-first: API keys encrypted with your OS keyring; history stored in a local SQLite DB; no telemetry.

I’m looking for technical feedback. What do you think current LLM testing/evaluation tools get most wrong?