A GPU/VRAM filter for finding LLMs that will run on your hardware

Name: A GPU/VRAM filter for finding LLMs that will run on your hardware
Availability: InStock
Author: mzubairtahir

by mzubairtahir·Jun 26, 2026·2 points·3 comments

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemSlick

Filters LLMs by your GPU VRAM with CPU offloading calculations.

Strengths

•CPU offloading calculations show realistic hybrid GPU+RAM memory usage
•Compares multiple quantization formats side-by-side for the same model
•Context length factored into memory estimates, not just model weights

Weaknesses

•Similar calculators exist in llama.cpp and various online tools
•Database limited to curated models, misses newly released ones

Post Description

I kept seeing people ask "Which model i can run on my gpu", "will model X fit on my GPU". Thats why I built a filter on whichllmmodel that lets you search models by what will actually fit on your hardware (8GB, 16GB, 24GB, etc.) at a given quantization level.