I built a persistent LSM-Tree storage engine in Go from scratch
LSM-Tree from scratch in Go: WAL, SkipList, SSTables, compaction—education-grade.
Go-LSM is a persistent Key-Value engine that ensures durability via a Write-Ahead Log (WAL) and a SkipList MemTable. It manages sorted disk storage using SSTables with footer-based indexing and handles data lifecycle through K-Way Merge compaction.
Well-crafted LSM-Tree reference implementation in Go—but RocksDB and BadgerDB already solve this production-grade.
Systems engineers, database enthusiasts, and CS students learning storage engine internals through clean reference implementation.
RocksDB · BadgerDB · LevelDB
I've always been curious about the internals of databases, so I decided to build my own Log-Structured Merge-Tree (LSM-Tree) engine in Go to understand the "magic" behind write-optimized storage.
Go-LSM is a persistent Key-Value engine that manages the full lifecycle of data from volatile RAM to immutable disk storage.
TECHNICAL HIGHLIGHTS:
1. THE DURABILITY LAYER (WAL)
To ensure zero data loss, I implemented an append-only Write-Ahead Log.
• Custom binary protocol: [Type][KeyLen][ValLen][Key][Value] • File.Sync() to force kernel flushes to physical hardware • Ensures absolute durability on system crashes
2. SKIPLIST MEMTABLE Instead of a standard tree, I used a SkipList for the in-memory layer.• Provides O(log n) search and insertion without the rebalancing complexity of Red-Black trees • Keeps keys lexicographically sorted for the eventual SSTable flush • Enables efficient memory-to-disk transitions
3. SSTABLE FOOTER-BASED INDEXING My SSTables are binary-searchable on disk using a tail-first reading strategy:• Jump to the last 8 bytes of a file to find the Index Block pointer • Avoid full file scans by reading directly to the index • Execute binary search on sorted keys for O(log n) disk lookups
4. MAINTENANCE LAYER: COMPACTION Built a K-Way Merge compaction engine that handles performance optimization:• Processes multiple SSTable layers and merges them into single, optimized files • Handles "Read Amplification" by reducing the number of files to check per query • Processes "Tombstones" to finalize deletions and reclaim disk space
5. TOOLING & DEBUGGING Custom binary parsers transform raw binary files into human-readable tables:• lsm-dump: View sorted SSTable contents • lsm-wal-dump: Inspect unflushed Write-Ahead Log entries • Enables deep inspection of internal storage layers
KEY ENGINEERING LESSONS: Moving from standard application development to systems programming required a fundamental shift in how I think about memory and I/O:• ENDIANNESS LOGIC: Handling Big-Endian vs. Little-Endian conversions for cross-platform compatibility • FILE OFFSET MANAGEMENT: Manually managing byte offsets and file pointer positioning • CONCURRENCY & THREAD SAFETY: Implementing thread-safe mechanisms for MemTable flushing • BINARY PROTOCOL DESIGN: Creating efficient, compact data encodings for durability
REPOSITORY: https://github.com/Jyotishmoy12/LSM-Tree-in-GolangLSM-Tree from scratch in Go: WAL, SkipList, SSTables, compaction—education-grade.
Educational LSM implementation in pure Go; zero production use case demonstrated.
Rust LSM-Tree engine, but RocksDB and Redb already dominate this space.
LSM-tree with SSI, column families, and adaptive compaction—solid database primitives, nothing novel.
LSM engine learning project; interesting technical details but shipping status unclear.
SQLite B-tree without SQL: 57–68% faster KV ops, single-header drop-in.