Back to browse
I'm tracking 197 known exposures of health data from UK Biobank

I'm tracking 197 known exposures of health data from UK Biobank

by Cynddl·Apr 23, 2026·197 points·57 comments

AI Analysis

●●SolidDark HorseBig Brain

Tracks 197 GitHub repos leaking UK Biobank health data using public DMCA notices.

Strengths
  • Unique data source (GitHub DMCA archive) applied to privacy research.
  • Clear visualizations of takedown trends and exposed file types.
  • Highlights systemic governance failures using concrete and public evidence.
Weaknesses
  • Read-only tracker, no mechanism to prevent leaks proactively.
  • Dependent on GitHub's public DMCA archive data for completeness.
Category
Target Audience

Privacy researchers, Security professionals

Similar To

Lumen Database · Have I Been Pwned

Post Description

I'm a researcher studying privacy, and I started tracking the DMCA notices that UK Biobank sends to GitHub. I tracked 110 notices filed so far, targeting 197 code repositories by 170 developers across the world.

Looking at the takedown notices, we often see specific files being targeted rather than entire repositories (possibly to justify the copyright infringement as required for a takedown notice, not a copyright expert; although it is clear that they only use DMCA notices as a last resort, for GitHub users they cannot identify, and who were likely not given access in the first place). A quarter of the files are genetic/genomics. Tabular data account for another large share and could contain phenotype or health records.

The exposure of Biobank data on GitHub is the latest in a long series of governance challenges for UK Biobank. The latest is today, with information of all half a million members listed for sale on Alibaba.

Similar Projects

Open Source●●●●Gem

BXP – An open standard for atmospheric exposure data

HTTP-equivalent standard for air quality data, federated networks, no licensing or hardware lock-in.

Zero to OneBig BrainSolve My Problem
BXP
102mo ago