Satellite imagery object detection using text prompts

Name: Satellite imagery object detection using text prompts
Availability: InStock
Author: eyasu6464

by eyasu6464·Mar 9, 2026·53 points·23 comments

Visit Project View on HN

AI Analysis

●MidShip ItEye Candy

VLM-based satellite detection sounds good until you remember YOLO and specialized models handle occlusion better.

Strengths

•Zero-shot prompting on satellite tiles is a clever pipeline—tile selection, WGS84 coordinate conversion, GeoJSON projection all sensible.
•No login, browser-based demo lowers friction to trying it; clean UI for polygon drawing and layer management.

Weaknesses

•Author explicitly admits specialized detectors (YOLO) handle occlusion better—so you're paying VLM inference latency for lower accuracy.
•Satellite object detection is well-served by Maxar, Planet Labs, Esri, and open models like YOLO; unclear what new capability this adds.

Post Description

I built a browser-based tool for detecting objects in satellite imagery using vision-language models (VLMs). You draw a polygon on the map and enter a text prompt such as "swimming pools", "oil tanks", or "buses". The system scans the selected area tile-by-tile and returns detections projected back onto the map as GeoJSON.

Pipeline: select area and zoom level, split the region into mercantile tiles, run each tile with the prompt through a VLM, convert predicted bounding boxes to geographic coordinates (WGS84), and render the results back on the map.

It works reasonably well for distinct structures in a zero-shot setting. occluded objects are still better handled by specialized detectors like YOLO models.

There is a public demo and no login required. I am mainly interested in feedback on detection quality, performance tradeoffs between VLMs and specialized detectors, and potential real-world use cases.