Back to browse
GitHub Repository

A fully local Retrieval-Augmented Generation (RAG) implementation for querying 25 years of Swiss Teletext news (500k articles in German language)

11 starsJupyter Notebook

Gemma 4 based local RAG on 25 Years of news articles

by folli·Apr 3, 2026·1 point·0 comments

AI Analysis

●●SolidNiche GemBig Brain

500k-article Swiss Teletext corpus makes this RAG demo actually interesting.

Strengths
  • Hybrid search combining vector + full-text for high-recall retrieval on real corpus
  • Fully local execution with pgvector means no data leaves your machine
  • Teletext's high-density summaries are genuinely clever source material for RAG
Weaknesses
  • German-only corpus limits broader applicability and testing
  • Proof of concept rather than general-purpose tool for other datasets
Category
Target Audience

Developers experimenting with local RAG pipelines and German-language NLP

Similar To

LlamaIndex · LangChain · PrivateGPT

Post Description

A fully local Retrieval-Augmented Generation (RAG) implementation for querying 25 years of Swiss Teletext news (~500k articles in German language) - based on Deepmind's most recent Gemma model.

Why? I thought it's a cool type of dataset (short/high density news summaries) to test some local RAG approaches. Gemma 4 gives some impressive results, but could probably use some more tweaking on the system prompt.

Similar Projects

AI/ML●●Solid

Local RAG on 25 Years of Teletext News

Local RAG on 500k teletext articles when most demos use toy datasets.

Niche Gem
folli
202mo ago
AI/ML●●Solid

Local Context and Memory Stack

Tops LongMemEval and LoCoMo benchmarks with local-first AI memory architecture.

Big BrainShip It
dhravya
108d ago