Back to browse
Image prompt game with multi-signal CLIP/HSV/HOG scoring

Image prompt game with multi-signal CLIP/HSV/HOG scoring

by gosu94·Feb 12, 2026·1 point·1 comment

AI Analysis

●●SolidBig BrainSlickCrowd Pleaser
The Take

The scoring model is the real clever bit: mixing semantic CLIP checks with a separate CLIP-based faithfulness test plus HSV and HOG-lite signals makes the feedback much harder to game than a single metric. The product wraps that into a tight, addictive loop—daily challenges, speed/unlimited modes and leaderboards—so it actually teaches you to write better prompts; just be aware it still inherits CLIP blind spots and could be optimized around the exact weighting scheme.

Category
Target Audience

Prompt engineers, AI/ML enthusiasts, generative-art creators, and anyone practicing text-to-image prompting

Post Description

Built this originally as a small competitive game, then it turned into a useful prompt-engineering practice loop.

Core mechanic: user sees a target image, writes a prompt, model generates output, and we score similarity.

Scoring uses multiple signals so one metric doesn’t dominate:

1. Semantic alignment (CLIP) - user_prompt -> target_image (is the prompt conceptually aligned with target?) - user_image -> target_image (is the generated result semantically aligned with target?)

2. Prompt faithfulness (CLIP) - user_prompt -> user_image (did generation actually follow the submitted prompt?)

3. Color similarity - HSV histogram overlap (user_image vs target_image) for palette/tone distribution

4. Structure similarity - HOG-lite gradient/orientation comparison (user_image vs target_image) for layout/edge composition

Final score is a weighted blend (content signals weighted highest), normalized to player-facing points.

Why this approach: - CLIP-only can overrate semantically related but visually off outputs - color-only ignores structure/meaning - structure-only misses semantics/style - combining prompt-image and image-image signals reduced obvious false positives in ranking

Stack: - Spring Boot backend - separate CLIP scoring container - external image generation service - Next.js frontend - PostgreSQL

Would love technical feedback on: - metric weighting/calibration - known failure modes I should benchmark - alternatives to HOG-lite for fast structural scoring

Similar Projects

Developer Tools●●Solid

Treliq – PR triage CLI with 20 signals and optional LLM scoring

Deduping PRs and scoring them with 20 heuristic signals is a concrete, useful idea — especially the scope-coherence signal and embedding auto-fallback for providers without embeddings. The repo supports CLI, a persistent server, GitHub App integration and an explicit --model flag for provider flexibility, but it's still early and adoption/UX examples (ranked output, workflows) are thin — promising engineering scaffolding that needs real-world validation.

Niche GemSolve My Problem
chrismagno
103mo ago