I indexed 37h of my videos using an RTX 4090 and local ML models in 24h

Name: I indexed 37h of my videos using an RTX 4090 and local ML models in 24h
Availability: InStock
Author: iliashad

by iliashad·Jun 30, 2026·3 points·1 comment

Visit Project View on HN

AI Analysis

●●SolidWizardryBig Brain

RTX 4090 crushes M1 Max for local video indexing with six ML plugins running in parallel.

Strengths

•Plugin architecture (DescriptorPlugin, FaceRecognitionPlugin, TextDetectionPlugin) enables modular video analysis.
•Detailed per-frame timing metrics show genuine engineering depth and benchmarking rigor.
•Self-hosted Docker deployment gives full control versus cloud video analysis APIs.

Weaknesses

•Submitting benchmark JSON instead of the actual tool limits what others can evaluate or use.
•Video indexing space has established players (Twelve Labs, VideoDB) with more complete offerings.

Post Description

TLDR: Following my recent blog post and Hacker News post (https://news.ycombinator.com/item?id=48528029). where I ran the desktop app on my M1 Max. This time, I’m using the self-hosted version, running in Docker, with an NVIDIA RTX 4090 (24 GB of VRAM).

The content is also fundamentally more demanding: long podcast episodes with at least two faces in every frame, coding tutorials packed with on-screen text, and screen recordings. GoPro footage is mostly wide outdoor shots.

But NVIDIA was much faster than my M1 Max.

The longest video was a livestream of 3h 12m indexed in 1h 52m (4,612 frames analyzed).

You can directly see the processing jobs results in JSON format here: https://gist.github.com/IliasHad/fd64e4d331e90e57d61e95f64e8...