Run any VLM on real-time video

Name: Run any VLM on real-time video
Availability: InStock
Author: zakariaelhjouji

by zakariaelhjouji·Mar 8, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●MidShip It

3-line real-time VLM API, but competing products handle camera inference already.

Strengths

•Genuinely simple SDK surface: three lines of code to chain camera input → model → callback is excellent DX
•Positioned for growth across assistive tech, home security, and moderation—clear use case ladder

Weaknesses

•Minimal technical differentiator: camera-to-model inference is table stakes for inference platforms (Replicate, Banana, Modal)
•Landing page shows no benchmarks, latency claims, or pricing—unclear value vs. Gradio/Streamlit video support or native SDKs
•No code, no open-source, no hosted demo evidence of actual capability beyond marketing copy

Similar Projects

AI/ML●●Solid

2500 vision benchmarks / evals for Vision Language Models

Daily arXiv scraping with Claude classification beats manual curation.

Niche GemBig Brain

zakariaelhjouji

102mo ago

AI/ML●●●●Gem

Can I run a model language on a 26-year-old console?

Streams LLM weights from CD-ROM during inference to fit 77MB models in 32MB RAM.

WizardryZero to OneBig Brain

xaskasdf

46122mo ago

Developer Tools●●Solid

A Pure-Python Computer Vision Library That's Fast and Minimal

Multi-threaded video capture fixes OpenCV's standard blocking I/O bottleneck for Python pipelines.

Solve My ProblemNiche Gem

abhiTronix

1029d ago

AI/ML●●●Banger

Marlin-2B: a tiny VLM to extract structured information from videos

Beats Qwen2.5-VL-7B on temporal grounding while running on a single consumer GPU.

Dark HorseBig Brain

HappyPablo

7228d ago

AI/ML●●●Banger

Cursed Browser – a VLM reads the HTML and hallucinates the page

Ditches the rendering engine entirely to let a VLM hallucinate the pixels from HTML.

WizardryBold BetRabbit Hole

scosman

7121d ago

AI/ML●●Solid

Agentic – Vesta AI Explorer

Runs Foundation Models on the Neural Engine and can also host MLX/GGUF models locally while offering an in-app HuggingFace browser, on-device WhisperKit/tts, vision analysis and image/video generation — all in a native SwiftUI shell. Exposing 33+ tools over TCP via the Model Context Protocol is a clever move for automation and orchestration, but the macOS-only scope and crowded local-LLM space mean it's a powerful niche play rather than a universal winner.

WizardrySlick

scouzi1966

114mo ago