GoldenMatch – Entity resolution with LLM scoring, 97% F1, no Spark
Fellegi-Sunter matching with active learning beats Dedupe.io on complex datasets.
Zero-config entity resolution that scales from a CSV to 100M+ rows on a Ray cluster (verified: 100M deduped in 213s, 0.30 GB driver). Fuzzy + exact + probabilistic dedupe, identity graph, PPRL, LLM boost. Python + full TypeScript port; SQL-native in PostgreSQL & DuckDB; MCP/REST servers, dbt + Airflow recipes.
Ray-based dedupe at 100M rows without Spark — that's a real architectural choice.
Data engineers, data scientists
Splink · Dedupe.io · OpenRefine
Fellegi-Sunter matching with active learning beats Dedupe.io on complex datasets.
100M free tokens is generous, but Hugging Face and Replicate already host models.
Ray-casting engine brings retro Wolf3D vibes to a browser-based moon trucking sim.
Spark without Databricks markup, but Kubernetes management is still ops work.
ByteBuddy injects trace context into Spark tasks; sees executor-level details no competitor offers.
TPC-H 1GB in 2 seconds on iPhone—Arrow Flight SQL running locally.