Back to browse
GitHub Repository

A GUI-first evaluation workbench for local LLMs running on Ollama. Build personal test suites, run sequential evaluations across installed models, visualize results through dashboards, and make keep-or-delete decisions. Think "Postman for local LLM evaluation."

19 starsTypeScript

ModelSweep - Open-Source Benchmarking for Local LLMs

by leonickson·Mar 17, 2026·2 points·0 comments

AI Analysis

●●SolidShip ItNiche GemSlick

Postman for local LLMs with LLM-as-Judge and Elo ratings built in.

Strengths
  • Sequential model testing with automatic VRAM preload/unload management
  • Four evaluation modes including adversarial red team testing scenarios
  • Fully local execution with zero data leaving the machine
Weaknesses
  • Two-day build means bugs and rough edges still present
  • Ollama-only limits broader model runner compatibility
Category
Target Audience

Developers testing local LLMs, Ollama users, AI researchers

Similar To

LangSmith · MLflow · LM Evaluation Harness

Similar Projects

Developer Tools●●Solid

A minimal context engine with streaming API

Git-like versioning for prompts running entirely locally with Ollama.

Niche GemSolve My Problem
tonelord
201mo ago