Back to browse
GitHub Repository

LLM orchestration toolkit for agent workflows: planner + workers + synthesis, optional router (LLM + learned fallback), supports OpenAI/Anthropic/Ollama/llama.cpp, real scraping with caching, MCP server integration, and a TUI chat UI.

41 starsPython

LLM-use – cost-effective LLM orchestrator for agents

by justvugg·Feb 19, 2026·2 points·1 comment

AI Analysis

●●SolidNiche GemBig Brain
The Take

Smart local‑first routing that only escalates to expensive cloud planners when necessary is the standout idea — combined with per‑run cost accounting and full Ollama offline support it solves a real operational itch. The repo is a pragmatic, CLI/TUI-focused toolkit (scraping + cache, MCP server mode) that feels useful for teams wanting a no‑friction orchestrator, but it’s playing in a crowded space of agent frameworks so the novelty is incremental rather than revolutionary.

Category
Target Audience

ML engineers, AI developers, and backend devs who build hybrid local/cloud LLM agent workflows

Post Description

Hi HN, Built llm-use: a lightweight Python toolkit for efficient agent workflows with multiple LLMs. Core pattern: strong model (Claude/GPT-4o/big local) for planning + synthesis; cheap/local workers for parallel subtasks (research, scrape, summarize, extract…). Features: • Mix Anthropic, OpenAI, Ollama, llama.cpp • Smart router: cheap/local first, escalate only if needed (learned + heuristic) • Parallel workers (–max-workers) • Real scraping + cache (BS4 or Playwright) • Offline-first (full Ollama support) • Cost tracking ($ for cloud, 0 local) • TUI chat + MCP server mode • Local session logs Quick example (hybrid):

python3 cli.py exec \ --orchestrator anthropic:claude-3-7-sonnet-20250219 \ --worker ollama:llama3.1:8b \ --enable-scrape \ --task "Summarize 6 recent sources on post-quantum crypto"

Or routed version:

python3 cli.py exec \ --router ollama:llama3.1:8b \ --orchestrator openai:o1 \ --worker gpt-4o-mini \ --task "Explain recent macOS security updates"

MIT licensed, minimal deps, embeddable. Repo: https://github.com/llm-use/llm-use Feedback welcome on: • Routing heuristics you’d find useful • Pain points with agent costs / local vs cloud • Missing integrations? Thanks!

Similar Projects

AI/ML●●Solid

AgentForge – Multi-LLM Orchestrator in 15KB of Python

AgentForge compresses common production patterns—token-aware rate limiting (token-bucket), retry+exponential backoff, prompt templates and cost tracking—into a tiny async core and lets you flip providers with one parameter. The multi-agent mesh and ReAct loop bits are the most interesting engineering bets here, and the repo includes benchmarks and a Streamlit demo, but it lives in a crowded space next to LangChain and similar toolkits so real differentiation will come from adoption and edge-case robustness.

Niche GemShip It
chunktort
213mo ago
AI/ML●●Solid

AgentForge – Multi-LLM Orchestrator in 15KB

AgentForge packs provider adapters (Claude, GPT‑4, Gemini, Perplexity), token-aware rate limiting, retry/backoff, and a MockLLMClient for tests into a tiny dependency surface — the 15KB footprint and 2 dependencies is an attention-grabber. The 3‑tier Redis cache and benchmark claims (huge latency/memory wins vs LangChain, 88% cache hit) make it a tempting low-overhead alternative, though you should validate provider feature parity and benchmarks against your workload.

Dark HorseWizardry
chunktort
103mo ago