HF-agents, CLI extension to find the best model/quant for your hardware

Name: HF-agents, CLI extension to find the best model/quant for your hardware
Availability: InStock
Author: clmnt

by clmnt·Mar 18, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidSolve My ProblemSlick

One command replaces manual GGUF hunting and hardware compatibility guesswork.

Strengths

•Hardware profiling via llmfit eliminates trial-and-error model quantization selection.
•Reuses existing llama-server instance instead of spawning redundant processes.
•Official HF CLI extension means proper integration with model hub authentication.

Weaknesses

•Orchestrates existing tools (llmfit, llama.cpp, Pi) rather than novel architecture.
•Limited to Pi agent — no support for other local coding agent frameworks.

Post Description

We've been building out CLI extensions for the Hugging Face hub, and hf-agents is a fun one to share.

It uses llmfit under the hood to profile your hardware and automatically select the best-fit model and quantization — no manual GGUF hunting. It then launches a Pi Agent on top of it. One command, local, fully open.

If you've been using Claude Code or Codex CLI and want something that runs entirely on your own hardware/models, this is a nice lightweight alternative to try.

Happy to answer questions — and curious what hardware setups people are running this on.

Similar Projects

AI/ML●●●Banger

HF viewer – visualize any Hugging Face model

Paste any HF URL to instantly see the full transformer architecture graph.

Eye CandyRabbit HoleWizardry

vottivott

501mo ago

AI/ML●Mid

Run Hugging Face models with a single command

Yet another model runner when Ollama already dominates this space.

SlickShip It

dataversity

233mo ago

AI/ML●●Solid

Llmfit;94 models, 30 providers.1 tool to see what runs on your hardware

The project nails a real pain: instead of guessing whether a 7B or 13B model will fit, llmfit inspects your system and ranks 94 models by fit, speed, context and quality, even recommending quantization and run modes and supporting multi‑GPU and MoE setups. The combo of an installable binary, interactive TUI for quick browsing and JSON output for automation makes it immediately useful; just remember its suggestions are heuristics — you’ll still want to validate edge cases with a real run.

Solve My ProblemWizardry

axjns

103mo ago

AI/ML●Mid