Detect any object in satellite imagery using a text prompt

Name: Detect any object in satellite imagery using a text prompt
Availability: InStock
Author: eyasu6464

by eyasu6464·Mar 8, 2026·22 points·7 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainNiche Gem

Tile-based VLM inference with coordinate projection, but dense objects still need YOLO.

Strengths

•Zero-shot object detection in satellite imagery without login or training
•Clever mercantile tile slicing + WGS84 projection pipeline for geographic accuracy
•Honest about limitations: dense/occluded structures where narrow models win

Weaknesses

•Struggles with realistic real-world scenarios (dense urban, occlusion) vs specialized models
•No clear differentiation from commercial satellite analytics (Planet, Maxar, Sentinel Hub integrations)

Post Description

I built a browser-based tool that uses Vision-Language Models (VLMs) to detect objects in satellite imagery via natural language prompts. Draw a polygon on the map, type what you want to find (e.g., "swimming pools," "oil tanks," "solar panels"), and the system scans tile-by-tile, projecting bounding boxes back onto the globe as GeoJSON. The pipeline: pick zoom level + prompt → slice map into mercantile tiles → feed each tile + prompt to VLM → create bounding boxes → project to WGS84 coordinates → render on map. No login required for the demo. Works well for distinct structures zero-shot; struggles with dense/occluded objects where narrow YOLO models still win.