Black Forest Labs CLI – let coding agents paint
API wrapper with agent-friendly flags—but FLUX, Replicate, and Fal already have CLIs.
GPU ML architecture exploration tool
JSON-defined ML architectures in Go beat Python boilerplate for rapid iteration.
ML researchers and engineers prototyping architectures
MLX · PyTorch Lightning · Hydra
Why: I wanted to compare attention vs Mamba vs GQA at different parameter budgets without writing PyTorch for each experiment. Edit a JSON config, hit enter, see loss numbers. It will race different configs for you. The number one goal is iteration speed.
JSON config lets you chain together common ML blocks (attention, GQA, mamba, RetNet, and several more) and optimizers (muon, adamw) and compiles them to MLX IR, which can either run on Metal or CUDA backends.
Why Go: 1.6s builds, built-in profiling (mixlab -cpuprofile gives you a flame graph), import-based extensibility for custom blocks. No C++ extensions, no custom build systems. And personally I prefer strongly-typed, compiled languages.
On a Shakespeare benchmark matching nanoGPT (6L, 6H, d=384, 10.8M params): val loss 1.5527 on M1 Max, 1.5588 on A40. PyTorch numerical parity confirmed to 8 decimal places.
brew install mrothroc/tap/mixlab
API wrapper with agent-friendly flags—but FLUX, Replicate, and Fal already have CLIs.
CUDA pipeline hits 60 FPS on 45MP RAW files, competing with Darktable.
Metal GPU stress testing in terminal, but is the workload realistic for benchmarking?
Type-safe JSON migrations inspired by database patterns—clean API but crowded space.
Drop a merchant-config.json into the CLI and you get 13 ready-made UCP endpoints — catalog, checkout lifecycle, orders, plus OAuth metadata — with Stripe/PayPal and SQLite wired up. The built-in OAuth identity-linking for agent workflows is the most interesting move; it's a focused, practical tool for prototyping UCP-compliant stores but not a substitute when you need bespoke business logic or complex fulfillment.
GPU-accelerated JSON parsing with 91ns selective queries versus simdjson's 24ms.