LLMForge – Orchestrate your LLM pipeline. Locally
Full LLM pipeline in one window when LM Studio only does inference.
Unified pipeline for GGUF and AWQ quantization without the ecosystem headache.
ML Engineers, Edge AI Developers
llama.cpp · AutoAWQ · MLC LLM
I'm building Qwodel, an open-source pipeline that automates the fragmented mess of LLM quantization.
If you've ever tried to prep a Hugging Face model for edge deployment or cheaper cloud inference, you know the drill: wrestling llm_compressor for AWQ, writing ctypes calls for llama.cpp for GGUF, or fighting memory leaks in coremltools for Apple Silicon.
Qwodel acts as a unified orchestration engine. Instead of context-switching between three different ecosystems, you pass the model, and we handle the memory chunking, edge-case graph conversions, and output production-ready formats (GGUF, AWQ, CoreML).
We are actively building and updating the package every week to add new model architectures and backend optimizations. You can check out the full reference guide here: docs.qwodel.com.
The project is entirely open-source. We would love for you to test it out, tear the architecture apart, and let us know where it breaks. We are wide open to pull requests, so feel free to raise bugs or contribute directly in the repo!
Full LLM pipeline in one window when LM Studio only does inference.
Backpressured pipeline with 60-80% dedup savings beats chatty multi-agent frameworks.
Custom GGUF parser with mmap beats llama.cpp load times, but zero stars means unproven claims.
Git worktree isolation enables parallel AI sessions without merge conflicts.
Graph-based LLM pipelines for Java, but LangChain4j already dominates and covers the same use cases more maturely.
Smart local‑first routing that only escalates to expensive cloud planners when necessary is the standout idea — combined with per‑run cost accounting and full Ollama offline support it solves a real operational itch. The repo is a pragmatic, CLI/TUI-focused toolkit (scraping + cache, MCP server mode) that feels useful for teams wanting a no‑friction orchestrator, but it’s playing in a crowded space of agent frameworks so the novelty is incremental rather than revolutionary.