Dust β Device Unified Serving Toolkit (CUDA for Phones)
CUDA for phones: native runtimes, thin bridges, real demos shipping GGUF and ONNX inference.
π¦ Unified Swift SDK for LLM inference across local and cloud providers
Actor-first Swift SDK eliminates vendor lock-in with compile-time safety, unlike LangChain.
Swift developers building AI applications
LangChain Β· Continue.dev Β· LlamaIndex
The interesting decision was going actor-first from day one. Every provider is a Swift actor. You get data-race freedom enforced at compile time, not by convention. Swift 6.2's strict concurrency makes this a hard guarantee, not a README promise. LangChain can't say that.
The part I'm most proud of β @Generable
@Generable struct FlightSearch { @Guide(description: "Origin airport code") let origin: String
@Guide(description: "Departure date", .format(.date)) let date: Date @Guide(.range(1...9)) let passengers: Int
}let result = try await provider.generate( "Book me a flight to Tokyo next Friday", model: .claude3_5Sonnet, returning: FlightSearch.self )
The macro expands at compile time (via swift-syntax) to generate JSON Schema, streaming partial types, and all conversion boilerplate. The API is deliberately aligned with Apple's new Foundation Models framework β so the same struct works against on-device Apple models on iOS 26 and against Claude or GPT-4 with zero changes.
On-device is a first-class citizen, not an afterthought Most Swift AI SDKs treat cloud as the primary path and shim local models in awkwardly. Conduit treats MLX, llama.cpp, Core ML, and Apple's Foundation Models as fully equal providers. A ChatSession configured with an MLX Llama model and one configured with GPT-4o are indistinguishable at the call site.
Trait-based compilation keeps binary size sane
AsyncThrowingStream all the way down. Cancellation works via standard Swift task cancellation β no special teardown protocol. Back-pressure is handled naturally by the async iterator.
12 providers, one interface Anthropic, OpenAI, Azure OpenAI, Ollama, OpenRouter, Kimi, MiniMax, HuggingFace Hub, MLX, llama.cpp, Core ML, Foundation Models. The OpenAI-compatible ones share a single OpenAIProvider actor β the named variants are thin configuration wrappers, not code forks.
https://github.com/christopherkarani/Conduit Happy to dig into the actor model approach, the macro expansion strategy, or why wrapping LangChain was never an option.
CUDA for phones: native runtimes, thin bridges, real demos shipping GGUF and ONNX inference.
Clean Swift wrapper for Gemma 4 with vision and audio on iPhone.
Custom Metal shaders beat llama.cpp and MLXβ1.67x faster on M4 Max.
LLaMA-Factory for agent memory with native GRPO and 14% performance gains.
LangChain alternative, but LiteLLM and LlamaIndex already cover this.
They combined typed payload validation, per-device sequential mailboxes and JS Actions running inside a WASM sandbox β a practical feature set that actually improves predictability for fleet code. The tiny Rust agent plus Python/JS/Elixir SDKs and an Elixir+DuckDB backend signal thoughtful infra choices rather than vaporware. Nice UX on the landing page, but the space is crowded; integrations and real-world scale will determine whether this stands out.