Back to browse
GitHub Repository

High-performance Rust extensions for Axolotl (no OOM for large datasets) - drop-in acceleration for existing installations.

3 starsPython

Fast-Axolotl – Rust extensions that make Axolotl fine-tuning 77x faster

by ticktockten·Mar 11, 2026·1 point·0 comments

AI Analysis

●●SolidNiche GemShip It

77x faster data loading but only helps if you're already using Axolotl specifically.

Strengths
  • Drop-in acceleration with single import line requires zero config changes.
  • 77x streaming speedup on 50k rows is benchmarked with specific methodology.
  • Cross-platform wheels for Linux, macOS, Windows with Python 3.10-3.12.
Weaknesses
  • Token packing and batch padding show overhead on small datasets due to FFI costs.
  • Rust-accelerated Python ML pipelines is a well-trodden pattern (Polars, etc.).
Category
Target Audience

ML engineers fine-tuning large language models

Similar To

Polars · Petastorm · WebDataset

Post Description

I built Rust extensions for Axolotl that dramatically speed up data loading and preprocessing for LLM fine-tuning.

The problem: Python data pipelines become the bottleneck when fine-tuning large models. Your GPUs sit idle waiting for data.

The solution: Drop-in Rust acceleration. One import line, zero config changes.

Results on 50k rows: - Streaming data loading: 0.009s vs 0.724s (77x faster) - Parallel SHA256 hashing: 0.027s vs 0.052s (1.9x faster)

Works with Parquet, Arrow, JSON, JSONL, CSV. Supports compression. Cross-platform.

Usage:

import fast_axolotl import axolotl # now accelerated pip install fast-axolotl

Built with PyO3 and maturin. MIT licensed. Happy to answer questions about the Rust/Python interop or benchmark methodology.

Similar Projects