Back to browse
GitHub Repository

A high-performance, enterprise-grade tool for identifying differences between S3-compatible buckets. Built by Nutanix for production workloads requiring reliable bucket synchronization verification, migration validation, and compliance auditing.

3 starsPython

Bucket Delta – Compute differences between two S3-compatible buckets

by drstrange14·Apr 7, 2026·2 points·0 comments

AI Analysis

●●SolidSolve My ProblemShip It

Validates ObjectLock and WORM status where aws s3 sync only checks object ETags.

Strengths
  • Checkpointing allows resuming 10M-object jobs without restarting from scratch after unexpected interruption.
  • Deep check mode validates ObjectLock and WORM status, not just object ETags.
  • Multi-process architecture hits 3,900 objects/sec in shallow comparison mode using twenty-five workers.
Weaknesses
  • Python-based concurrency might lag behind Go-based tools like s5cmd for raw speed.
  • Requires managing checkpoint files manually if running across different cloud provider environments.
Target Audience

DevOps engineers, Cloud architects, Data migration teams

Similar To

s5cmd · AWS CLI · rclone

Post Description

We built this at Nutanix to solve a recurring problem: detecting data drift between large object stores (hundreds of millions of objects).

Existing tools like AWS CLI (aws s3 sync --dryrun) and s5cmd (--dry-run) are great for many workflows, but we had slightly different requirements for deeper and more flexible comparisons. That led us to build this tool, and we’ve now open-sourced it for broader use.

Key Features:

1. Bidirectional diff: Given two buckets A and B, the tool reports both (A−B) and (B−A).

2. Shallow Check Mode: Compares object name + ETag to determine presence.

3. Deep check mode: Compares tags, metadata, and ObjectLock/WORM via HeadObject.

4. Resumability: Checkpointing allows long-running jobs to resume seamlessly. For example, a 10M-object run interrupted at 8M continues from where it left off.

Performance (500K objects): Shallow mode: ~3,900 objects/sec Deep mode: ~1,345 objects/sec (25 worker processes)

The tool works across any two S3-compatible buckets.

Happy to discuss any query regarding the tool!

Similar Projects