Back to browse
GitHub Repository

Semantic search over videos using Gemini Embedding 2 or Qwen3-VL.

4,272 starsPython

Gemini can now natively embed video, so I built sub-second video search

by sohamrj·Mar 24, 2026·438 points·108 comments

AI Analysis

●●●BangerWizardryZero to OneBig Brain

Direct video-to-vector embedding skips transcription entirely—Twelve Labs but self-hosted.

Strengths
  • Gemini's native video embedding avoids transcription errors and captures visual context
  • Still-frame detection skips idle chunks, cutting costs for security camera footage
  • Working CLI with auto-trim output solves a genuinely painful search problem
Weaknesses
  • Requires Gemini API key; no offline embedding option for sensitive footage
  • Dashcam niche limits broader adoption beyond security and fleet use cases
Category
Target Audience

Developers working with dashcam or security camera footage

Similar To

Twelve Labs · Google Vertex AI Video Intelligence

Post Description

Gemini Embedding 2 can project raw video directly into a 768-dimensional vector space alongside text. No transcription, no frame captioning, no intermediate text. A query like "green car cutting me off" is directly comparable to a 30-second video clip at the vector level.

I used this to build a CLI that indexes hours of footage into ChromaDB, then searches it with natural language and auto-trims the matching clip. Demo video on the GitHub README. Indexing costs ~$2.50/hr of footage. Still-frame detection skips idle chunks, so security camera / sentry mode footage is much cheaper.

Similar Projects

Developer Tools●●Solid

Fallow – Find unused code, duplication, and complexity in TS/JS (Rust)

Sub-second dead code detection in Rust when ESLint and TypeScript already exist.

SlickSolve My Problem
bartwaardenburg
422mo ago