Back to browse
GitHub Repository

Fırat University Assistant: An offline Turkish question-answering and document search system built on local PDFs using FastAPI, pdfplumber, and BM25.

8 starsPython

An offline document search engine for my university's messy PDFs

by Yigtwx·Feb 27, 2026·2 points·1 comment

AI Analysis

MidNiche GemSolve My Problem

BM25 + Turkish NLP for one university's messy PDFs—solves a real problem, narrow audience.

Strengths
  • Robust Turkish text pipeline with stemming, normalization, and bigram bonuses for local accuracy.
  • PDF parsing handles dual-column layouts, header/footer removal, and hyphen fixes—respects real document chaos.
  • Offline-first design with no hallucination guard-rails means safe answers or refusal, not confabulation.
Weaknesses
  • Scope limited to one institution's PDF collection; unclear if generalizable to other universities or languages.
  • BM25 without embeddings may miss semantic relevance that modern RAG systems handle better.
Target Audience

University students, educators, document-heavy organizations needing offline search

Similar To

Weaviate · Milvus · LlamaIndex

Similar Projects