Back to browse
GitHub Repository

Fast Cython/OpenMP-powered 3D volume resampling for NumPy, with PyTorch- and SciPy-compatible nearest, linear, area, cubic, and grid sampling on CPU.

2 starsPython

Volresample – 3D volume resampling up to 13× faster than PyTorch on CPU

by hojijoji·Feb 28, 2026·1 point·0 comments

AI Analysis

●●●BangerWizardryNiche Gem

Medical imaging resampling 13× faster than PyTorch—genuine performance engineering.

Strengths
  • Pre-computed index tables + branchless loops deliver measurable real-world speedups across multiple modes.
  • Eliminates PyTorch overhead for CPU-only workloads; int16/area mode paths PyTorch doesn't optimize.
  • Minimal API surface (2 functions) and zero dependencies make adoption friction-free.
Weaknesses
  • Narrow use case (3D volumetric data); won't matter to general ML practitioners.
  • Float32-only for interpolation limits applicability in some medical imaging pipelines.
Target Audience

Medical imaging researchers, computational scientists using Python

Similar To

PyTorch (interpolate/grid_sample) · NumPy · SciPy.ndimage

Post Description

I built a small Cython + OpenMP library for resampling 3D volumes (medical images, etc.). It's an almost drop-in replacement for torch.nn.functional.interpolate and grid_sample, but runs on NumPy arrays and doesn't require PyTorch.

Benchmarks (Intel i7, 4 cores, PyTorch 2.8.0):

resample 512³→256³ trilinear: 34 ms vs 55 ms (1.6×) area mode: 65 ms vs 613 ms (9.5×) — PyTorch doesn't parallelize this well int16 nearest: 8 ms vs 93 ms (11×) — PyTorch has no native int16 path (even 13x on single thread) grid_sample 128³: 38 ms vs 169 ms (4.4×) The main wins come from: pre-computed index tables, fused-type specialization (no dtype casting), branchless inner loops, and OpenMP parallelization that actually scales for single-image workloads.

No GPU, no autograd, float32-only for interpolation — just fast CPU resampling with a 2-function API.

pip install volresample

GitHub: https://github.com/JoHof/volresample

If you find it interesting, I wrote about the motivation and some implementation details here: https://johof.github.io/2026/02/volresample-3d-volume-resamp...

Similar Projects

AI/ML●●●Banger

Mamba3-minimal – PyTorch implementation of Mamba-3

Readable Mamba-3 in pure PyTorch solves the trapezoidal discretization cross-boundary dependency without custom kernels.

Big BrainWizardryNiche Gem
vikramkarlex
103mo ago
AI/ML●●●Banger

Diarize – CPU-only speaker diarization, 7x faster than pyannote

Matches pyannote on accuracy, runs 8x faster on CPU, no signup—genuine infrastructure win.

Solve My ProblemDark Horse
loookas
343mo ago