GitHub Repository

Stop silent data leakage in ML training pipelines.

4 starsPython

Timefence – Python lib to detect temporal data leak in ML training

Name: Timefence – Python lib to detect temporal data leak in ML training
Availability: InStock
Author: Emojizing

by Emojizing·Feb 12, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerSolve My ProblemBig Brain

Catches silent data leakage that train/test metrics miss — DuckDB-powered, 1M rows in 12s.

Strengths

•Solves a real, invisible problem that haunts ML teams — no errors thrown, metrics lie.
•DuckDB backend scales to millions of rows; quickstart example with actual leaky data lets you verify immediately.
•Point-in-time correct joins are non-trivial; the implementation here is a genuine productivity win.

Weaknesses

•Narrow audience: only valuable if you're doing feature engineering with temporal joins.
•Documentation examples show the happy path; edge cases around multiple timestamp columns unclear.

Post Description

Hey folks. I built Timefence because I kept hitting the same bug when building ML training sets. You LEFT JOIN feature tables to labels and some rows end up with feature timestamps after the prediction event so the model trains on future data. Took me forever to debug the first time because nothing errors out.

Timefence audits your dataset for any rows where feature_time > label_time, and can rebuild it with point-in-time correct joins. Built on DuckDB, handles 1M labels × 10 features in ~12s. Also has a --strict flag for CI.

pip install timefence timefence quickstart churn-example && cd churn-example timefence audit data/train_LEAKY.parquet

MIT licensed. Happy to answer any questions you might have.

Similar Projects

AI/ML●●Solid

RewardGuard – detect reward hacking in RL training loops

Catches reward hacking before it tanks your RL training run.

Niche GemBig Brain

Giovan321

111mo ago

AI/ML●●●Banger

A memory database that forgets, consolidates, and detects contradiction

Vector DBs store memories; this one forgets, consolidates, and flags contradictions like human memory.

Big BrainZero to OneWizardry

pranabsarkar

48332mo ago

Developer Tools●●Solid

Extra-Platforms, Python library to detect OS, arch, shell, CI, AI

Zero-dependency Python library replacing removed stdlib functions with comprehensive platform detection.

Solve My ProblemCozy

kdeldycke

922mo ago

AI/ML●Mid

Trained YOLOX from scratch to avoid Ultralytics (aircraft detection)

The author documents ripping out Ultralytics and training YOLOX end-to-end on an aircraft dataset, releasing code under an MIT license so you can run and modify the whole pipeline yourself. This is the sort of no-frills, reproducible recipe that saves time if you need full control over configs, checkpoints and licensing — not novel research, but genuinely useful for people who hit the limits of packaged repos.

Niche GemSolve My Problem

auspiv

213mo ago

AI/ML●●●Banger

Detecting deepfakes without sending your files to a cloud API

Detects deepfakes locally using optical flow vectors instead of sending files to cloud APIs.

Big BrainWizardry

XQorp

201mo ago

Security●Mid

API key leak scanner – finds and shows credentials in your codebase

Yet another secret scanner, but this one's a single Python file.

Ship It

JasperBlank2001

113mo ago