Detect any object in satellite imagery using a text prompt
Tile-based VLM inference with coordinate projection, but dense objects still need YOLO.

VLM-based satellite detection sounds good until you remember YOLO and specialized models handle occlusion better.
Geospatial analysts, environmental researchers, urban planners, commercial imagery users
Planet Labs API · Esri ArcGIS · OpenCV + YOLO pipelines
Pipeline: select area and zoom level, split the region into mercantile tiles, run each tile with the prompt through a VLM, convert predicted bounding boxes to geographic coordinates (WGS84), and render the results back on the map.
It works reasonably well for distinct structures in a zero-shot setting. occluded objects are still better handled by specialized detectors like YOLO models.
There is a public demo and no login required. I am mainly interested in feedback on detection quality, performance tradeoffs between VLMs and specialized detectors, and potential real-world use cases.
Tile-based VLM inference with coordinate projection, but dense objects still need YOLO.
Open-vocabulary object detection exporting to YOLO format without login requirements.
AI object detection for text placement, but Canva and Adobe Express already do this better.
GPU-accelerated 30K object rendering is impressive, but the space tracking category already has Heavens-Above and N2YO.
Yet another prompt enhancer when PromptPerfect and dozens of AI wrappers already exist.
Fun concept, but another prompt-to-game wrapper in a crowded field.