What I learned running a crypto data pipeline at 120M messages/day

Name: What I learned running a crypto data pipeline at 120M messages/day
Author: Qalypto

by Qalypto·Mar 4, 2026·4 points·1 comment

View on HN

AI Analysis

●MidBig Brain

Valuable operational learnings but this is a blog post, not a product.

Strengths

•Real production numbers: 120M messages/day on €46/month infrastructure
•Specific solutions for orderbook gaps, ClickHouse batching, logging OOM

Weaknesses

•Not a product or tool you can use, just educational content
•No code repository or reusable components to evaluate

Post Description

Been running this for 5 months now. 4 exchanges (Binance, Bybit, OKX, Bitget), 10 perpetuals, all on a single Hetzner box for 46 euro/month.

Stack: Python asyncio, Kafka in KRaft mode, ClickHouse, k3s. Cloudflare Tunnel handles ingress.

Some things that broke along the way:

ORDERBOOK GAPS Exchanges skip sequence numbers sometimes. Your local book drifts and you dont notice until something goes wrong. Had to build per-symbol gap detection with automatic snapshot recovery. Each exchange does sequencing differently so thats four separate implementations.

CLICKHOUSE INSERTS Started with small batches, ClickHouse was at 30% CPU just doing merges. Bumped batch size to 5000 rows with 2 second intervals, dropped to 8%. Also moved inserts to an async queue so the Kafka consumer never blocks.

LOGGING At 500 msg/s the logger was allocating thousands of strings per second. OOM killer got us twice before I figured it out. Set everything on the data path to WARNING and it went away.

Current numbers: - 120M+ messages/day - P50: 250ms, P95: 400ms latency - >99.8% data coverage - 5 months, no major incidents

If anyone wants to poke around the data:

qalypto.com/data-lab

CSV samples, no signup needed.

Happy to answer questions.