I built a persistent LSM-Tree storage engine in Go from scratch
Well-crafted LSM-Tree reference implementation in Go—but RocksDB and BadgerDB already solve this production-grade.
Go-LSM is a persistent Key-Value engine that ensures durability via a Write-Ahead Log (WAL) and a SkipList MemTable. It manages sorted disk storage using SSTables with footer-based indexing and handles data lifecycle through K-Way Merge compaction.
LSM-Tree from scratch in Go: WAL, SkipList, SSTables, compaction—education-grade.
Backend engineers, database enthusiasts, systems programming students
RocksDB · LevelDB · Badger
INTRODUCTION:
I've always been curious about the internals of databases, so I decided to build my own Log-Structured Merge-Tree (LSM-Tree) engine in Go to understand the "magic" behind write-optimized storage.
Go-LSM is a persistent Key-Value engine that manages the full lifecycle of data from volatile RAM to immutable disk storage.
TECHNICAL HIGHLIGHTS:
1. THE DURABILITY LAYER (WAL)
To ensure zero data loss, I implemented an append-only Write-Ahead Log.
• Custom binary protocol: [Type][KeyLen][ValLen][Key][Value] • File.Sync() to force kernel flushes to physical hardware • Ensures absolute durability on system crashes
2. SKIPLIST MEMTABLEInstead of a standard tree, I used a SkipList for the in-memory layer.
• Provides O(log n) search and insertion without the rebalancing complexity of Red-Black trees • Keeps keys lexicographically sorted for the eventual SSTable flush • Enables efficient memory-to-disk transitions
3. SSTABLE FOOTER-BASED INDEXINGMy SSTables are binary-searchable on disk using a tail-first reading strategy:
• Jump to the last 8 bytes of a file to find the Index Block pointer • Avoid full file scans by reading directly to the index • Execute binary search on sorted keys for O(log n) disk lookups
4. MAINTENANCE LAYER: COMPACTIONBuilt a K-Way Merge compaction engine that handles performance optimization:
• Processes multiple SSTable layers and merges them into single, optimized files • Handles "Read Amplification" by reducing the number of files to check per query • Processes "Tombstones" to finalize deletions and reclaim disk space
5. TOOLING & DEBUGGINGCustom binary parsers transform raw binary files into human-readable tables:
• lsm-dump: View sorted SSTable contents • lsm-wal-dump: Inspect unflushed Write-Ahead Log entries • Enables deep inspection of internal storage layers
KEY ENGINEERING LESSONS:Moving from standard application development to systems programming required a fundamental shift in how I think about memory and I/O:
• ENDIANNESS LOGIC: Handling Big-Endian vs. Little-Endian conversions for cross-platform compatibility • FILE OFFSET MANAGEMENT: Manually managing byte offsets and file pointer positioning • CONCURRENCY & THREAD SAFETY: Implementing thread-safe mechanisms for MemTable flushing • BINARY PROTOCOL DESIGN: Creating efficient, compact data encodings for durability
REPOSITORY:Well-crafted LSM-Tree reference implementation in Go—but RocksDB and BadgerDB already solve this production-grade.
Rust LSM-Tree engine, but RocksDB and Redb already dominate this space.
Educational LSM implementation in pure Go; zero production use case demonstrated.
LSM engine learning project; interesting technical details but shipping status unclear.
LSM-tree with SSI, column families, and adaptive compaction—solid database primitives, nothing novel.
This actually simulates the full write path — WAL to memtable to immutable flush — and animates cascading leveled compactions so you can watch key movement and file counts in real time. The live bloom-filter checks and amplification metrics are the parts that will teach you something immediately; it's clearly built as a learning/debugging sandbox rather than a production profiler.