Back to browse
GitHub Repository

Python library to search over the Epstein Files. AI-powered vector search across unsealed court documents, FBI reports, and flight logs. Runs entirely locally or with API.

6 starsPython

Epstein-Search – Local, AI-Powered Search Engine for the Epstein Files

by simulationship·Mar 1, 2026·1 point·0 comments

AI Analysis

●●SolidBig BrainNiche Gem

Offline RAG over Epstein Files with sentence-transformers and local LLM fallback.

Strengths
  • Entirely offline vector search (no API keys, no data leakage) is genuinely privacy-respecting for sensitive documents
  • Pre-computed embeddings (all-MiniLM-L6-v2) means setup is one minute, not hours of indexing
  • LiteLLM + Ollama/LM Studio integration lets users choose local or cloud LLMs on the fly
Weaknesses
  • 100K pre-computed chunks may be stale if source documents are updated; no refresh mechanism documented
  • Niche corpus (one case) limits utility; not a general-purpose RAG framework
Category
Target Audience

Researchers, journalists, and investigators analyzing public Epstein case documents

Similar To

ChatPDF · DocumentCloud · LlamaIndex

Post Description

Hi HN, I built epstein-search, an open-source Python CLI and library to run semantic search and RAG over the publicly released Epstein Files (unsealed court documents, depositions, FBI reports, and flight logs). I wanted a way to easily navigate through these thousands of pages of unstructured legal PDFs without relying on a paid third-party service or sending data back and forth to a cloud provider. How it works under the hood: Running epstein-search setup downloads ~100K pre-computed document chunks and embeddings (using all-MiniLM-L6-v2) based on the public 20K document corpus. It imports these into zvec (a local vector database) so the index is ready in about a minute. Standard search (epstein-search search) embeds your query locally using sentence-transformers and does a vector similarity search. This step is 100% offline and requires no API keys. For the conversational RAG mode (epstein-search chat or ask), it uses LiteLLM. You can point it to an Ollama or LM Studio instance for a completely free, local, and private pipeline, or plug in a cloud provider like Anthropic, OpenAI, or Gemini. You can also filter queries by document type (e.g., --doc-type flight_log or --source "FBI") and output the raw source context alongside the generated answers to verify the LLM's claims. The dataset is strictly sourced from public domain releases (DOJ, House Oversight Committee, unsealed federal court docs). Repo: https://github.com/simulationship/epstein-search I'd love to hear your thoughts, feedback on the code, or any ideas for improving the local RAG pipeline! Happy to answer any questions.

Similar Projects

Open Source●●Solid

A CLI to query the unsealed court files with local LLMs

RAG over Epstein PDFs works offline, but sensationalism and crypto-tip jar hurt credibility.

Big BrainSolve My Problem
simulationship
203mo ago
Data●●Solid

Jmail Launches Jcal

Google Calendar UI for Epstein flight logs makes navigating thousands of documents actually possible.

Niche GemDark Horse
immatheus
632mo ago
OtherPass

JeffTube

Repackaging DOJ court records as TikTok parody raises liability and taste questions.

dvrp
4943mo ago