Cuckoo-GPU – A 350x faster Bloom filter alternative for GPUs
350x faster GPU Bloom filter with academic paper backing the performance claims.
High-Performance GPU Super Bloom Filter
92× faster than CPU Super Bloom with minimizer-based shard selection.
Bioinformatics researchers, computational biologists working with genome sequences
cuCollections GBBF · GPU Cuckoo Filter · Counting Quotient Filter
350x faster GPU Bloom filter with academic paper backing the performance claims.
50x faster than PaddleOCR Python with real TensorRT benchmarks on RTX 5090.
This reads like a GPU engineer's field notes — one ~3,400-line CUDA file implements a full per-thread crypto pipeline (key gen → EC multiply → SHA-256 → RIPEMD-160) and a two-stage bloom+binary-search matcher to check ~3,100 targets at ~100M keys per batch. The article digs into concrete low-level choices (LUT layout, memory hierarchy, __ldg reads, atomicCAS reporting, and per-mode keygen strategies), which is rare in public writeups; downside is it's closed-source and the dual-use/ethical implications should be called out more explicitly.
CUDA pipeline hits 60 FPS on 45MP RAW files, competing with Darktable.
50x faster than PaddleOCR Python with real TensorRT benchmarks.
Direct2D GPU PDF renderer with CPU fallback, but alpha-stage and Windows-only.