Back to browse
Custom ML model – if Spotify was instrumental

Custom ML model – if Spotify was instrumental

by day6·Feb 15, 2026·1 point·0 comments

AI Analysis

MidNiche GemShip It
The Take

This ships a tidy, browsable library of on-demand vocal removals backed by a custom 'BiMamba UNet' trained on a curated ~500-song dataset — the interface actually feels built for music discovery rather than a bare demo. Don't expect SOTA separations (the author trained on one A40 for days and is honest about limits), but the vibe-coded dataset and integrated conversion pool make it a fun, useful tool for quickly grabbing instrumentals.

Category
Target Audience

Music producers, content creators, DJs, hobbyists and listeners who want instrumental/karaoke versions of songs

Post Description

Come make use of my 1TB Cloudfront free tier and try out my BiMamba UNet model :D

This is not SOTA, though it was also only trained for 3 days on a singular A40 (~$30). It was also vibe coded in a week, very much possible thanks to combining existing ideas in the space and opus.

I had a custom dataset of ~500 songs, from finding official instrumentals and vibe-code aligning them together + some vibe coded synthetic snippets coming from "please get some vocal / voice and instrument textures/datasets and piece them together", "please generate edge cases like vocaloid filters, or really quiet instrumentals over very loud voices", etc.

I have one GPU running all conversions rn, so new imports might be slow but once done, they should be good forever! (and an existing pool of songs exist)

Similar Projects

AI/ML●●Solid

Instrumental Model from Scratch (With Demo)

The architecture is the project's real showpiece: a 72-band non‑uniform band-split BiMamba U‑Net that uses Mamba scans for O(T) memory and interleaved attention in the bottleneck to mix cross‑frequency context — a clever tradeoff between temporal efficiency and global attention. The author ships a runnable demo and an explanatory write-up so you can reproduce the approach, but it's clearly hobby-scale (≈1k songs trained, single home PC queue, slow cold starts), so expect experimental results rather than SOTA separation or instant throughput.

WizardryNiche Gem
day6
103mo ago
AI/ML●●Solid

See you speaker's output on a piano

50FPS inference on consumer laptop using Basic-Pitch with cpal audio capture.

Niche GemWizardry
ecstrema
301mo ago