Back to browse
GitHub Repository

metal collective communication library (pytorch DDP)

7 starsC++

MCCL Distributed PyTorch training across MacBooks via Thunderbolt

by sassoshots44·Mar 21, 2026·1 point·0 comments

AI Analysis

●●SolidWizardryNiche GemShip It

Two MacBooks syncing gradients over Thunderbolt — slower than single-GPU but it works.

Strengths
  • Fills genuine gap: PyTorch MPS multi-process collectives didn't exist before this.
  • Honest benchmarks admit 10x slower — rare transparency in ML infrastructure projects.
  • Uses vDSP reductions and Metal for fp16/bf16 with overlapped TCP transport.
Weaknesses
  • Only tested on 2 nodes — no validation for larger clusters or production workloads.
  • Performance is worse than single-GPU, limiting real-world utility.
Target Audience

ML researchers with multiple Macs experimenting with distributed training

Similar To

NCCL · Gloo · PyTorch Distributed

Similar Projects

AI/ML●●Solid

Neural Abyss – PyTorch multi-agent combat simulator

Per-agent PPO runtime with tensor-first simulation state is genuinely clever architecture.

Big BrainNiche Gem
luthor190397
102mo ago