Back to browse
GitHub Repository

Semantic search over videos using Gemini Embedding 2 or Qwen3-VL.

4,276 starsPython

SentrySearch – Semantic search over dashcam footage

by sohamrj·Mar 18, 2026·2 points·0 comments

AI Analysis

●●●BangerBig BrainNiche Gem

Native video embedding beats captioning pipelines — Gemini 2 does the heavy lifting.

Strengths
  • Direct video-to-text embedding skips the transcription bottleneck entirely.
  • ChromaDB indexing with ffmpeg auto-trim makes results immediately usable.
  • $2.50/hr indexing cost is reasonable for the capability you get.
Weaknesses
  • Locked into Google's Gemini API — no local or alternative model support.
  • Still frames and scene detection optimizations are marked as TODO.
Category
Target Audience

Developers working with video footage, security applications, dashcam users

Similar To

JinaAI · Firecrawl · Video understanding APIs

Post Description

I built a semantic search CLI for dashcam footage using Gemini Embedding 2's native video embedding.

The interesting part: Gemini Embedding 2 projects raw mp4 video directly into the same vector space as text, no captioning or transcription pipeline. You embed 30-second video chunks as RETRIEVAL_DOCUMENT, embed a text query as RETRIEVAL_QUERY, and cosine similarity just works across modalities.

The tool splits footage into overlapping chunks, indexes them in a local ChromaDB instance, and auto-trims the top match from the original file via ffmpeg.

Cost is about $2.50/hr of footage to index, queries are negligible. Definitely room to optimize: skipping still frames, scene detection for smarter chunking, etc.

Similar Projects

Open Source●●Solid

A CLI to query the unsealed court files with local LLMs

RAG over Epstein PDFs works offline, but sensationalism and crypto-tip jar hurt credibility.

Big BrainSolve My Problem
simulationship
203mo ago