Dust – Device Unified Serving Toolkit (CUDA for Phones)
CUDA for phones: native runtimes, thin bridges, real demos shipping GGUF and ONNX inference.
Android 16 fork. AI as a platform primitive. Twelve capabilities, one shared runtime, every app. OEM-pluggable. Apache 2.0.
Shared inference runtime at the OS level saves RAM compared to per-app model bundling.
Android system developers, privacy-focused ROM builders, on-device ML engineers
Android NNAPI · Apple Core ML · Qualcomm AI Engine
It adds: a system service in system_server a native daemon pluggable inference backends a Binder AIDL for capability-based calls across text, audio, and vision The goal is to centralize things that are otherwise handled independently by apps, like model residency, scheduling, fairness, and backend routing.
The current interface exposes 12 capabilities including completion, translation, rerank, embeddings, transcription, synthesis, VAD, OCR, detection, and description.
Repo: https://github.com/Jibar-OS/JibarOS
Interested in feedback from anyone who has worked on Android framework/services, ML runtimes, or device-level resource scheduling.
CUDA for phones: native runtimes, thin bridges, real demos shipping GGUF and ONNX inference.
Yet another Excalidraw backend when the official version already has cloud saving and collab.
C# bindings for Google's LiteRT-LM fill the .NET Android on-device LLM gap.
O(1) fork latency makes tree search 1000x faster than vLLM for agentic workloads.
GitHub for prompts, but PromptBase and countless repos already do this.
Cryptographic audit trail for ML inference when ONNX Runtime can't prove what computed.