Aludel – LLM eval workbench for Phoenix apps
Phoenix LiveView embedding beats switching to LangSmith for Elixir teams.
A GUI-first evaluation workbench for local LLMs running on Ollama. Build personal test suites, run sequential evaluations across installed models, visualize results through dashboards, and make keep-or-delete decisions. Think "Postman for local LLM evaluation."
Postman for local LLMs with LLM-as-Judge and Elo ratings built in.
Developers testing local LLMs, Ollama users, AI researchers
LangSmith · MLflow · LM Evaluation Harness
Phoenix LiveView embedding beats switching to LangSmith for Elixir teams.
Task-specific LLM benchmarking beats generic leaderboards that ignore your actual workload.
AST-based validation for function calling tests, but BFCL already covers this ground.
Opposite-narrator test catches models agreeing with both sides of same dispute.
Git-like versioning for prompts running entirely locally with Ollama.
Expands corpus to 16 CVE-anchored scenarios to break model ties.