GitHub Repository

AI-powered screen memory — captures, analyzes, and lets you search/chat your screen history. Powered by Gemma 4 . 100% local, 100% private.

152 starsPython

Running a vision model on every screenshot on-device

Name: Running a vision model on every screenshot on-device
Availability: InStock
Author: alexkarpathy

by alexkarpathy·Jun 29, 2026·16 points·1 comment

Visit Project View on HN

AI Analysis

●●●BangerZero to OneBig Brain

Gemma 4 vision model running entirely local beats Microsoft Recall on privacy.

Strengths

•Content-change detection captures only when screen actually changes
•Hybrid search combines MiniLM embeddings with FTS5 keyword matching
•MCP server integration works with Claude, Cursor, and VSCode

Weaknesses

•Gemma 4 local inference requires significant RAM and compute resources
•Background service complexity may impact battery life on laptops

Post Description

hi author here, Screenmind is privacy first Microsoft recall alternative . It runs on gemma 4 which is one of the fewer models supporting vision audio and reasoning all 3, so your data never leaves you machine.

With screenmind you can keep a track of your timeline , how much time you spent on what..search any screenshot with any text on it.. and the coolest thing, you can chat with your screen history, like what did alex texted me on discord or did i received any mail from Microsoft, if it was on your screen , you can prompt it in the cha. and also you can make automations on top of it, like send me my whole day report on slack(it has integrations )..you can also write automation either though plain English for not so coders or use the python for devs who want to deep dive, and you can save voice memos(with a screenshot) with just a hotkey, and get you meeting transcribed and summarised(auto detects meeting)

the hardest part which i faced was keep running screenmind as a background service it would not have been not hard if chat feature didn't existed, as running local model requires compute ..and keep analyzing screenshots continuously will keep all the resouces hogged up for that i came up with a perceptual has cache .. the three tier cache system reduces inference upto 40% for an average user(which is me)..and to reduce the inference time more i came up with three modes..fast balanced and accurate..where the tradeoff is between time and accuracy

for now i use it daily on my 4gb gtx 1650 with fast mode, works pretty fine also it would be much faster on high end machine , it also has a mcp server so you can just ask claude desktop/cursor about the bug you saw in morning..

supports windows/mac/Linux

being upfront about rough edges , it is not extensively tested on mac and installation has some friction , for which i m working on one click installer thing

(reposting- i put up an earlier version a few months back, comments got flagged cuz of new account so couldn't reply to any )

repo:github.com/ayushh0110/ScreenMind

curious about anyone have idea for how to approach multi monitor support

Similar Projects

AI/ML●●●Banger

Running a vision model on every screenshot on-device

Privacy-first Recall alternative running Gemma 4 vision entirely on-device.

Big BrainZero to One

skye0110

106d ago

AI/ML●●Solid

I run a vision model on every screenshot, locally, on a 4GB GPU

Runs multimodal screen memory on 4GB VRAM while Microsoft Recall requires high-end NPU.

Dark HorseSolve My Problem

skye0110

36715d ago

Developer Tools●●Solid

Scan0tron – AI screen capture that auto-fills forms ($49)

Computer vision + Playwright automation for form filling, but $49 price ties to crowded category.

Solve My ProblemSlick

jaydurangodev

103mo ago

AI/ML●●Solid

UX Agent Mac app running continuously locally using Gemma 4

Local Gemma 4 vision agent critiques UI continuously without sending screenshots to the cloud.

CozyBig Brain

tommyjepsen

102mo ago

Developer Tools●●Solid

GrabShot – Screenshot API with AI cleanup and device frames

The product packs sensible, practical features — device frames, element selectors, dark-mode capture, ad-blocking, and an OG-friendly public-key cache flow — into one API call and a clean demo. The AI 'cleanup' and performance claims (sub-3s renders, <50ms cached) are interesting selling points but the space is crowded; the real winner will be reliability, price, and how well the AI cleanup handles messy pages.

SlickSolve My ProblemCrowd Pleaser

grabshot_dev

104mo ago

Developer Tools●Mid

SnapToCode – Screenshot any UI and get clean Tailwind code

Yet another screenshot-to-code tool when v0 and Builder.io already dominate this space.

Ship It

adithagrawaal

2023d ago