Clangd for CUDA Device Code
Compile CUDA for AMD GPUs with zero code changes—breaks NVIDIA lock-in.

Installs flash-attn in 9 seconds instead of 45 minutes — no cmake, no CUDA hell.
ML engineers, data scientists, AI researchers using PyTorch/CUDA
PyPI · conda-forge · NVIDIA NGC
Compile CUDA for AMD GPUs with zero code changes—breaks NVIDIA lock-in.
First MCP server letting Claude directly flash and debug STM32 boards from chat.
JSON-defined ML architectures in Go beat Python boilerplate for rapid iteration.
Solves a real build problem on jailbroken iPhone, but the audience is ~5 people and there's no actual product.
Wraps existing libraries so AI can't touch the filesystem unless you explicitly allow it.
Convenient cross-platform builds of Alula's decomp, but just a Drive folder with no installer.