Back to browse
GitHub Repository

Go-LSM is a persistent Key-Value engine that ensures durability via a Write-Ahead Log (WAL) and a SkipList MemTable. It manages sorted disk storage using SSTables with footer-based indexing and handles data lifecycle through K-Way Merge compaction.

1 starsGo

I built a persistent LSM-Tree storage engine in Go from scratch

by Jyotishmoy·Feb 26, 2026·1 point·0 comments

AI Analysis

●●SolidWizardryBig BrainShip It

LSM-Tree from scratch in Go: WAL, SkipList, SSTables, compaction—education-grade.

Strengths
  • Full architecture implemented: WAL with hardware sync, SkipList memtable, footer-indexed SSTables, K-way merge compaction
  • Clear pedagogical value with 5-layer design, binary protocol details, and stress-test tooling (lsm-cli, lsm-dump)
  • Addresses real database internals: tombstone handling, read/write amplification, sorted persistence
Weaknesses
  • No benchmarks vs RocksDB/LevelDB—unclear real-world performance or production readiness claims
  • Educational value, but storage engines are well-understood territory; no novel algorithmic or architectural insight
Target Audience

Backend engineers, database enthusiasts, systems programming students

Similar To

RocksDB · LevelDB · Badger

Post Description

GO-LSM: BUILDING A LOG-STRUCTURED MERGE-TREE ENGINE

INTRODUCTION:

I've always been curious about the internals of databases, so I decided to build my own Log-Structured Merge-Tree (LSM-Tree) engine in Go to understand the "magic" behind write-optimized storage.

Go-LSM is a persistent Key-Value engine that manages the full lifecycle of data from volatile RAM to immutable disk storage.

TECHNICAL HIGHLIGHTS:

1. THE DURABILITY LAYER (WAL)

To ensure zero data loss, I implemented an append-only Write-Ahead Log.

• Custom binary protocol: [Type][KeyLen][ValLen][Key][Value] • File.Sync() to force kernel flushes to physical hardware • Ensures absolute durability on system crashes

2. SKIPLIST MEMTABLE

Instead of a standard tree, I used a SkipList for the in-memory layer.

• Provides O(log n) search and insertion without the rebalancing complexity of Red-Black trees • Keeps keys lexicographically sorted for the eventual SSTable flush • Enables efficient memory-to-disk transitions

3. SSTABLE FOOTER-BASED INDEXING

My SSTables are binary-searchable on disk using a tail-first reading strategy:

• Jump to the last 8 bytes of a file to find the Index Block pointer • Avoid full file scans by reading directly to the index • Execute binary search on sorted keys for O(log n) disk lookups

4. MAINTENANCE LAYER: COMPACTION

Built a K-Way Merge compaction engine that handles performance optimization:

• Processes multiple SSTable layers and merges them into single, optimized files • Handles "Read Amplification" by reducing the number of files to check per query • Processes "Tombstones" to finalize deletions and reclaim disk space

5. TOOLING & DEBUGGING

Custom binary parsers transform raw binary files into human-readable tables:

• lsm-dump: View sorted SSTable contents • lsm-wal-dump: Inspect unflushed Write-Ahead Log entries • Enables deep inspection of internal storage layers

KEY ENGINEERING LESSONS:

Moving from standard application development to systems programming required a fundamental shift in how I think about memory and I/O:

• ENDIANNESS LOGIC: Handling Big-Endian vs. Little-Endian conversions for cross-platform compatibility • FILE OFFSET MANAGEMENT: Manually managing byte offsets and file pointer positioning • CONCURRENCY & THREAD SAFETY: Implementing thread-safe mechanisms for MemTable flushing • BINARY PROTOCOL DESIGN: Creating efficient, compact data encodings for durability

REPOSITORY:

https://github.com/Jyotishmoy12/LSM-Tree-in-Golang

Similar Projects

Infrastructure●●Solid

I built a persistent LSM-Tree storage engine in Go from scratch

Well-crafted LSM-Tree reference implementation in Go—but RocksDB and BadgerDB already solve this production-grade.

Big BrainWizardry
Jyotishmoy
103mo ago
Developer Tools●●Solid

I built an interactive LSM Tree simulator

This actually simulates the full write path — WAL to memtable to immutable flush — and animates cascading leveled compactions so you can watch key movement and file counts in real time. The live bloom-filter checks and amplification metrics are the parts that will teach you something immediately; it's clearly built as a learning/debugging sandbox rather than a production profiler.

Niche GemWizardry
saiprakashreddy
103mo ago