Back to browse
GitHub Repository

File Hunter — catalog, deduplicate, and consolidate your archive storage

59 starsPython

FileHunter, Self-hosted file manager that remembers disconnected drives

by zen-logic·Mar 3, 2026·3 points·3 comments

AI Analysis

●●●BangerSolve My ProblemNiche GemShip It

Catalog millions of files once, browse offline forever—three-tier hashing kills duplicates across terabytes.

Strengths
  • Solves a genuinely frustrating, under-served problem: knowing what's on drives without constant plugging-in
  • Clever three-gate hashing strategy (size → xxHash64 partial → SHA-256 full) minimizes I/O while guaranteeing accuracy across 10M+ files
  • Polished, production-ready UI with zero external dependencies (Starlette, vanilla JS, no build step), full keyboard nav, 17 themes, and responsive scanning
Weaknesses
  • Niche audience limits viral potential—appeals primarily to archivists and media hoarders, not mainstream users
  • No cloud sync or multi-instance support mentioned; single-server model may frustrate users managing archives across machines
Target Audience

Power users managing large archives, media professionals, anyone with drawers of USB drives

Similar To

Synology File Station (but offline + deduplication focus) · Tagsistant (file tagging, but no offline catalog)

Post Description

Hi HN. I built File Hunter because I have a drawer full of USB drives and no idea what's on them without plugging each one in.

File Hunter is a self-hosted, web-based file manager. You point it at any folder — USB drive, network share, DVD — and it catalogs everything into SQLite. When you unplug the drive, the full catalog stays. You can browse, search, and review files on storage that isn't connected.

The other thing it does well is deduplication. A three-tier hashing strategy (file size → xxHash64 partial → SHA-256 full) finds exact duplicates across all your locations with minimal I/O. Then you can consolidate: keep one copy, stub the rest, full audit trail.

Some numbers: I run it on a catalog of ~7 million files across 9.6 TB and 10 locations. The UI stays responsive during scans.

Tech: Python, Starlette, uvicorn, SQLite (WAL mode), vanilla JavaScript. No frameworks, no build step, no npm. One curl command to install:

curl -fsSL https://filehunter.zenlogic.uk/install | bash

It's MIT-licensed and free. There's a paid Pro tier that adds remote agents (scan machines across the network into one catalog), but everything on the GitHub page is the free version and will stay that way.

Website: https://filehunter.zenlogic.uk GitHub: https://github.com/zen-logic/file-hunter

Happy to answer questions about the architecture, the dedup strategy, or anything else.

Similar Projects