Granite Switch - compose multiple LoRA adapters to one deployable model
Composing multiple LoRA adapters into one checkpoint solves the model sprawl nightmare.
A memory and execution optimization architecture for AI models
3.59ms for 100 LoRA adapters with zero HBM writes—genuine GPU wizardry.
ML infrastructure engineers, LLM serving teams
LoRAX · vLLM · Punica
Composing multiple LoRA adapters into one checkpoint solves the model sprawl nightmare.
Stores memory in LoRA weights instead of cache, but lacks working benchmarks.
Self-bootstrapping agent writes its own improvements in 100 lines of TypeScript.
Thin CLI wrapper around deploybase.ai when the website already shows this data.
Agent runtime infra, but 0 stars and crowded with LangGraph and Temporal.
Linux finally gets offline voice typing; Ctrl-tap + Vulkan GPU support vs cloud-dependent alternatives.