Back to browse
GitHub Repository

Stop silent data leakage in ML training pipelines.

4 starsPython

Timefence – Python lib to detect temporal data leak in ML training

by Emojizing·Feb 12, 2026·2 points·0 comments

AI Analysis

●●●BangerSolve My ProblemBig Brain

Catches silent data leakage that train/test metrics miss — DuckDB-powered, 1M rows in 12s.

Strengths
  • Solves a real, invisible problem that haunts ML teams — no errors thrown, metrics lie.
  • DuckDB backend scales to millions of rows; quickstart example with actual leaky data lets you verify immediately.
  • Point-in-time correct joins are non-trivial; the implementation here is a genuine productivity win.
Weaknesses
  • Narrow audience: only valuable if you're doing feature engineering with temporal joins.
  • Documentation examples show the happy path; edge cases around multiple timestamp columns unclear.
Target Audience

ML/Data engineers building training pipelines

Similar To

Great Expectations (data validation framework) · Feast (feature store with temporal logic)

Post Description

Hey folks. I built Timefence because I kept hitting the same bug when building ML training sets. You LEFT JOIN feature tables to labels and some rows end up with feature timestamps after the prediction event so the model trains on future data. Took me forever to debug the first time because nothing errors out.

Timefence audits your dataset for any rows where feature_time > label_time, and can rebuild it with point-in-time correct joins. Built on DuckDB, handles 1M labels × 10 features in ~12s. Also has a --strict flag for CI.

pip install timefence timefence quickstart churn-example && cd churn-example timefence audit data/train_LEAKY.parquet

MIT licensed. Happy to answer any questions you might have.

Similar Projects

AI/ML●●Solid

RewardGuard – detect reward hacking in RL training loops

Catches reward hacking before it tanks your RL training run.

Niche GemBig Brain
Giovan321
111mo ago
AI/ML●●●Banger

A memory database that forgets, consolidates, and detects contradiction

Vector DBs store memories; this one forgets, consolidates, and flags contradictions like human memory.

Big BrainZero to OneWizardry
pranabsarkar
48332mo ago
AI/MLMid

Trained YOLOX from scratch to avoid Ultralytics (aircraft detection)

The author documents ripping out Ultralytics and training YOLOX end-to-end on an aircraft dataset, releasing code under an MIT license so you can run and modify the whole pipeline yourself. This is the sort of no-frills, reproducible recipe that saves time if you need full control over configs, checkpoints and licensing — not novel research, but genuinely useful for people who hit the limits of packaged repos.

Niche GemSolve My Problem
auspiv
213mo ago