Digest AI vs HN About

GitHub Repository

Clark Hash, 32x smaller searchable sketches for embeddings

7 starsRust

Clark Hash, 32x smaller searchable sketches for embeddings

by stan_kirdey·May 27, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerBig BrainWizardry

32x embedding compression without calibration beats product quantization's training overhead.

Strengths

•Stateless sparse-JL projection means zero corpus-wide calibration before storing vectors
•Asymmetric query sketches let database vectors stay bit-packed while queries remain floating-point
•Published paper sources and reproducible benchmarks on crates.io with docs.rs API docs

Weaknesses

•Only includes flat compressed-scan index; no HNSW or ANN integration for large-scale deployments
•Quality tradeoffs depend heavily on embedding model and sketch configuration parameters

Category

Target Audience

ML engineers, vector database builders, edge AI developers

Similar To

Faiss product quantization · ScaNN · DiskANN

Post Description

made a small library using GPT5.5-Pro and autoresearch

you can convert 384-dim f32 vectors go from 1536 bytes to 48 bytes without calibration. works for petabyte scale processing of text in pure online manner.

Similar Projects

Developer Tools●Mid

Imgfprint – deterministic image fingerprinting library for Rust

Yet another image hashing library when imagehash and phash already exist.

Niche Gem

bravo1goingdark

204mo ago

Infrastructure●●●Banger

See – searchable JSON compression, smaller than ZSTD (on our data)

Beats Zstd-19 on size, keeps JSON queryable without external indexes.

Big BrainWizardry

Tetsuro

315mo ago

Developer Tools●●Solid

SplatHash – A lightweight alternative to BlurHash and ThumbHash

16-byte image hash with 7.5ms faster decode than ThumbHash, but encode is 4x slower.

Big BrainWizardry

unsorted2270

61194mo ago

Design●●Solid

Inkwash, a watercolor sketching app and explanation

WebGL watercolor sim with live interactive demos embedded in the technical explanation.

Eye CandyCozyRabbit Hole

Yenrabbit

250291mo ago

Developer Tools●●Solid

Photon – Rust pipeline that embeds/tags/hashes images locally w SigLIP

Local SigLIP embeddings + 68K-term semantic tagging in a single Rust binary, zero cloud.

WizardryNiche GemShip It

pgbouncer

314mo ago

Productivity●●Solid

Turn Bilibili favorites into a personal RAG knowledge base

Bilibili-specific RAG pipeline with fallback ASR for inaccessible audio URLs.

Solve My ProblemNiche Gem

via2026

205mo ago