GitHub Repository

Speaker diarization in Rust. 312–912x realtime on Apple Silicon, 50–121x on CUDA. Matches pyannote accuracy.

78 starsRust

Speakrs Full PyAnnotate pipeline in Rust/ONNX 20-37x times faster macOS

Name: Speakrs Full PyAnnotate pipeline in Rust/ONNX 20-37x times faster macOS
Availability: InStock
Author: praveenperera

by praveenperera·May 26, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardrySolve My ProblemNiche Gem

CoreML-powered diarization that's 37x faster than pyannote on Apple Silicon.

Strengths

•Verifiable 7.1% DER benchmark matching pyannote's 7.2% accuracy on VoxConverse
•Zero Python runtime dependency — pure Rust library with CoreML model exports
•Background queue system enables batch processing with full CPU/GPU/ANE utilization

Weaknesses

•Linux/CUDA only sees 2-3x speedup, not the dramatic macOS gains
•Narrow audience — speaker diarization is a specific ML niche

Post Description

Speakrs implements the full pyannote community-1 style diarization pipeline in Rust: segmentation, powerset decode, overlap-add aggregation, binarization, embedding, PLDA, and VBx clustering.

There is no Python runtime in the library path. Inference runs on ONNX Runtime or native CoreML, and the rest of the pipeline stays in Rust.

It is 20x-30x faster on macOS, but only 2-3x faster on linux/cuda (depending on CPU).

Few reasons its faster:

1. Speakrs is using coreml versions of the models. I exported the models specifically to run on coreml. PyAnnote just runs the same the same PyTorch versions through MPS (Metal) on macOS.

2. PyAnnote is not a single model, its a few different models put together in a pipeline, the readme has some info on the full pipeline.

3. Speakrs optimizes the pipeline so different parts can run on CPU, Neural Engine and GPU. Speakrs has a batch mode, where you can run on multiple files at once, doing this also lets you keep CPU/GPU/ANE all fully utilized.

This is why on linux/cuda its not that much faster, PyAnnotate is already optimized to run on cuda, the speed improvements we get on cuda is by running some stuff on cpu while the other stuff runs on the GPU. The speedup on linux will depend on how powerful the CPU is.

There is also a fast mode, that sacrifices some speed for accuracy, that can be up to 50x faster, and for some types of audio doesn't sacrifice that much accuracy. The benchmarks have more info on this.