Back to browse
A GPU/VRAM filter for finding LLMs that will run on your hardware

A GPU/VRAM filter for finding LLMs that will run on your hardware

by mzubairtahir·Jun 26, 2026·2 points·3 comments

AI Analysis

●●SolidSolve My ProblemSlick

Filters LLMs by your GPU VRAM with CPU offloading calculations.

Strengths
  • CPU offloading calculations show realistic hybrid GPU+RAM memory usage
  • Compares multiple quantization formats side-by-side for the same model
  • Context length factored into memory estimates, not just model weights
Weaknesses
  • Similar calculators exist in llama.cpp and various online tools
  • Database limited to curated models, misses newly released ones
Category
Target Audience

Local LLM runners, hobbyists with consumer GPUs

Similar To

llama.cpp · GPT4All · Hugging Face

Post Description

I kept seeing people ask "Which model i can run on my gpu", "will model X fit on my GPU". Thats why I built a filter on whichllmmodel that lets you search models by what will actually fit on your hardware (8GB, 16GB, 24GB, etc.) at a given quantization level.

Similar Projects

AI/ML●●●Banger

Whichllm – Find and run the best local LLM for your hardware

One command finds and runs the best local LLM for your exact hardware specs.

Solve My ProblemBig BrainNiche Gem
andyyyy64
303mo ago