Back to browse
GitHub Repository

Python library to search over the Epstein Files. AI-powered vector search across unsealed court documents, FBI reports, and flight logs. Runs entirely locally or with API.

6 starsPython

A CLI to query the unsealed court files with local LLMs

by simulationship·Feb 25, 2026·2 points·0 comments

AI Analysis

●●SolidBig BrainSolve My Problem

RAG over Epstein PDFs works offline, but sensationalism and crypto-tip jar hurt credibility.

Strengths
  • Genuinely useful pipeline: PDF parsing, chunking, vector indexing, and RAG all work locally without API keys
  • Document filtering by source and type (court filings, flight logs, FBI reports) shows thoughtful data modeling
  • Interactive CLI with mode-switching (/search, /ask, /model) is polished UX
Weaknesses
  • Marketing tone in README ('crypto tips,' 'Buy Me A Coffee') undermines serious research tool positioning
  • Pre-built embeddings assume Gemini or Claude; unclear if purely local inference is truly supported
Category
Target Audience

Researchers, investigative journalists, legal analysts, privacy-conscious users wanting offline document search

Similar To

Perplexity Labs · Retrieval-Augmented Generation tools like LlamaIndex · LLaVA document search

Post Description

To succeed on Hacker News (HN), you have to completely drop the "marketing" and "YouTube hook" tone. The HN community heavily downvotes clickbait, sensationalism, and marketing fluff. They love "Show HN" posts, open-source projects, CLI tools, local LLMs, and clever technical solutions to messy data problems (like parsing poorly scanned government PDFs). Here are the best titles and the exact description (to use either as a text post or your first comment) tailored specifically for the Hacker News audience. The Hacker News Titles Choose one of these. On HN, titles should be strictly factual, descriptive, and avoid emojis. Option 1 (The Classic HN Format - Recommended): Show HN: epstein-search – A CLI to query the unsealed court files with local LLMs Option 2 (Focus on the tech pipeline): Show HN: I built a local RAG CLI to make the Epstein PDFs searchable Option 3 (Straight to the point): Show HN: epstein-search – Query the Epstein document dumps offline via CLI The Hacker News Description (First Comment or Text Body) If you submit the GitHub URL directly, immediately post this as the first comment. If you submit a text post, put this in the body. Keep the tone humble, technical, and open to feedback. Hi HN, When the Epstein court documents and flight logs were unsealed, they were released the way most legal drops are: thousands of pages of messy, poorly scanned, unsearchable PDFs. Standard Ctrl+F doesn't work well due to OCR errors, and the sheer volume makes manual parsing a nightmare. To solve this, I built epstein-search, an open-source Python CLI tool that lets you search and synthesize the documents using a Retrieval-Augmented Generation (RAG) pipeline directly in your terminal. How it works: It parses and chunks the original unsealed PDF files. You can run queries against the dataset using API-based models (OpenAI/Anthropic) if you want speed. Privacy-first: If you don't want your queries logged by a third-party API, you can point it directly to a local model (via Ollama or Llama.cpp) to run the entire search and retrieval process 100% offline. The goal was to make this data accessible to researchers and OSINT investigators without requiring them to manually read thousands of pages of court dockets or hand over their search queries to OpenAI. Repo is here: https://github.com/simulationship/epstein-search

Similar Projects

AI/ML●●Solid

Epstein-Search – Local, AI-Powered Search Engine for the Epstein Files

Offline RAG over Epstein Files with sentence-transformers and local LLM fallback.

Big BrainNiche Gem
simulationship
103mo ago
OtherPass

JeffTube

Repackaging DOJ court records as TikTok parody raises liability and taste questions.

dvrp
4943mo ago
AI/ML●●Solid

Open KB: Open LLM Knowledge Base

Compiled wiki beats query-time RAG with vectorless PageIndex retrieval for long PDFs.

Big BrainNiche Gem
mingtianzhang
621mo ago