Google Video API bill for 4 videos. I built my own
Replaces a $400 Google API bill with local YOLO, DeepFace, and Whisper running in Docker.
Local-first Video Knowledge Base. Index your video library with multi-modal analysis (YOLO, DeepFace, Whisper), search semantically via natural language, Docker-ready.
Local YOLO and Whisper indexing beats Google's $400 API bill for private video search.
Video editors, content creators, and archivists with large local video libraries
Google Video Intelligence API · Twelve Labs
I decided to build my own tool that needs to have 3 important things: can transcribe videos, analyse video frames, and everything needs to be done locally.
I don't wanna deal with storing my videos in the cloud because of two concerns: privacy and storage cost.
I've been working for the last couple of months. I have a source available version that can be used for free (personal and commercial use with companies that have fewer than 5 people). Available here (https://github.com/IliasHad/edit-mind), and the project has 1.3k Github stars
Now, I'm building a desktop app with direct NLE integration (Final Cut Pro, DaVinci Resolve, and Adobe Premiere Pro). This includes an editing agent that understands your footage and your editing style. (https://edit-mind.com)
Demo Video: https://youtu.be/jcctyfVg_34
Replaces a $400 Google API bill with local YOLO, DeepFace, and Whisper running in Docker.
MCP integration with Cursor and Claude Code sets this apart from generic RAG tools.
Zero-knowledge encryption for transcripts is nice, but Otter.ai and TurboScribe exist.
Uses YOLO to detect cats before streaming, avoiding boring empty bed footage.
Curated prompt templates for Claude and Gemini coding agents.
Preserves document structure instead of flattening to text like most RAG tools.