Back to browse
GitHub Repository

Clark Hash, 32x smaller searchable sketches for embeddings

7 starsRust

Clark Hash, 32x smaller searchable sketches for embeddings

by stan_kirdey·May 27, 2026·1 point·0 comments

AI Analysis

●●●BangerBig BrainWizardry

32x embedding compression without calibration beats product quantization's training overhead.

Strengths
  • Stateless sparse-JL projection means zero corpus-wide calibration before storing vectors
  • Asymmetric query sketches let database vectors stay bit-packed while queries remain floating-point
  • Published paper sources and reproducible benchmarks on crates.io with docs.rs API docs
Weaknesses
  • Only includes flat compressed-scan index; no HNSW or ANN integration for large-scale deployments
  • Quality tradeoffs depend heavily on embedding model and sketch configuration parameters
Category
Target Audience

ML engineers, vector database builders, edge AI developers

Similar To

Faiss product quantization · ScaNN · DiskANN

Post Description

made a small library using GPT5.5-Pro and autoresearch

you can convert 384-dim f32 vectors go from 1536 bytes to 48 bytes without calibration. works for petabyte scale processing of text in pure online manner.

Similar Projects

Developer Tools●●Solid

Photon – Rust pipeline that embeds/tags/hashes images locally w SigLIP

Local SigLIP embeddings + 68K-term semantic tagging in a single Rust binary, zero cloud.

WizardryNiche GemShip It
pgbouncer
313mo ago