Back to browse
Hacker News archive (47M+ items, 11.6GB) as Parquet, updated every 5m

Hacker News archive (47M+ items, 11.6GB) as Parquet, updated every 5m

by tamnd·Mar 14, 2026·408 points·167 comments

AI Analysis

●●SolidNiche GemSolve My Problem

47M HN items in Parquet, auto-updating every 5 minutes on Hugging Face.

Strengths
  • Parquet format optimized for analytics and ML pipelines
  • Auto-updates every 5 minutes—always current
  • Hosted on Hugging Face with built-in dataset viewer and API access
Weaknesses
  • HN data already available via BigQuery and official Firebase dump
  • Parquet format is nice but not groundbreaking for data engineers
Category
Target Audience

Data scientists, ML researchers, HN analysts

Similar To

Google BigQuery public datasets · Firebase HN dump · Hugging Face datasets

Similar Projects

Data●●●Banger

HN-fdw – All of Hacker News, queryable from Postgres, with zero copies

Zero-copy Postgres queries against 47M rows using DuckDB FDW and HTTP range requests.

WizardryBig BrainDark Horse
tamnd
202mo ago
Open Source●●●Banger

Self-hosted static archive of 20 years of Hacker News

Runs 22GB of HN history entirely in-browser using lazy-loaded SQLite shards over WebAssembly.

Rabbit HoleZero to OneWizardry
keepamovin
1125d ago