I built a local data lake for AI powered data engineering and analytics

Name: I built a local data lake for AI powered data engineering and analytics
Availability: InStock
Author: vpfaiz

by vpfaiz·Apr 8, 2026·14 points·10 comments

Visit Project View on HN

AI Analysis

●MidSolve My ProblemShip It

Zero-cloud data stack with built-in LLMs, but DuckDB already does local analytics.

Strengths

•Built-in Gemma and Qwen models require no separate LLM setup
•Zero-ETL ingestion from databases, webpages, CSV, Excel files
•Lineage and versioning tracked locally without cloud accounts

Weaknesses

•Competes with DuckDB, Jupyter, Databricks local without clear moat
•Early stage with 16GB RAM minimum limits accessibility

Post Description

I got tired of the overhead required to run even a simple data analysis - cloud setup, ETL pipelines, orchestration, cost monitoring - so I built a fully local data-stack/IDE where I can write SQL/Py, run it, see results, and iterate quickly and interactively.

You get data lake like catalog, zero-ETL, lineage, versioning, and analytics running entirely on your machine. You can import from a database, webpage, CSV, etc. and query in natural language or do your own work in SQL/Pyspark. Connect to local models like Gemma or cloud LLMs like Claude for querying and analysis. You don’t have to setup local LLMs, it comes built in.

This is completely free. No cloud account required.

Downloading the software - https://getnile.ai/downloads

Watch a demo - https://www.youtube.com/watch?v=C6qSFLylryk

Check the code repo - https://github.com/NileData/local

This is still early and I'd genuinely love your feedback on what's broken, what's missing, and if you find this useful for your data and analytics work.

Similar Projects

Infrastructure●●Solid

Orchestera – Managed Apache Spark on Kubernetes in Your Own AWS Account

Spark without Databricks markup, but Kubernetes management is still ops work.

Solve My ProblemDark Horse

iamspoilt

313mo ago

Hardware●Mid

Decima-8 – A deterministic 230KB neuromorphic engine and 1.3MB IDE

Neuromorphic engine on a deterministic rhythm, but v0.2 design-freeze with no working demo yet.

Bold BetWizardry

intentgarden

213mo ago

AI/ML●●●Banger

Drift – an embedding-model upgrade should be a rotation, not a reindex

Orthogonal Procrustes migration means embedding model upgrades without reindexing.

WizardryBig Brain

aayush4vedi

634d ago

Education●Mid

I built a small repertoir of different computing systems

Curated list of computing substrates lacks depth beyond basic taxonomy tags.

Niche GemCozy

tugdual

401mo ago

Infrastructure●●Solid

IceGate – Observability data lake engine

Apache Iceberg for observability data cuts costs versus Datadog and Honeycomb pricing.

Big BrainBold Bet

mineev

1552mo ago

Developer Tools●●Solid

Promo-kit – Receipt-backed promotion engine for tool catalogs

Receipt-backed promotion decisions with SHA-256 hashes and commit linkage is a practical, low-ceremony way to make spotlight selections auditable. The zero-dependency CLI, freeze modes and drift reports show this was designed for governance-first catalogs rather than casual lists — useful and sensible, but narrowly aimed.

Niche GemShip It

mikeyfrilot

123mo ago