Back to browse
Detect any object in satellite imagery using a text prompt

Detect any object in satellite imagery using a text prompt

by eyasu6464·Mar 8, 2026·22 points·7 comments

AI Analysis

●●SolidBig BrainNiche Gem

Tile-based VLM inference with coordinate projection, but dense objects still need YOLO.

Strengths
  • Zero-shot object detection in satellite imagery without login or training
  • Clever mercantile tile slicing + WGS84 projection pipeline for geographic accuracy
  • Honest about limitations: dense/occluded structures where narrow models win
Weaknesses
  • Struggles with realistic real-world scenarios (dense urban, occlusion) vs specialized models
  • No clear differentiation from commercial satellite analytics (Planet, Maxar, Sentinel Hub integrations)
Category
Target Audience

Geospatial analysts, urban planners, satellite imagery researchers

Similar To

Planet Labs API · Maxar GeoTiff analysis · Google Earth Engine

Post Description

I built a browser-based tool that uses Vision-Language Models (VLMs) to detect objects in satellite imagery via natural language prompts. Draw a polygon on the map, type what you want to find (e.g., "swimming pools," "oil tanks," "solar panels"), and the system scans tile-by-tile, projecting bounding boxes back onto the globe as GeoJSON. The pipeline: pick zoom level + prompt → slice map into mercantile tiles → feed each tile + prompt to VLM → create bounding boxes → project to WGS84 coordinates → render on map. No login required for the demo. Works well for distinct structures zero-shot; struggles with dense/occluded objects where narrow YOLO models still win.

Similar Projects