GitHub Repository

Optimal open-addressing hash maps (Elastic Hashing & Funnel Hashing) in Rust, with Python bindings.

7 starsRust

Opthash – Rust implementations of Elastic and Funnel hashing

Name: Opthash – Rust implementations of Elastic and Funnel hashing
Availability: InStock
Author: aayd

by aayd·Jun 2, 2026·4 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidNiche GemBig Brain

Implements 2025 hashing research with arena layout optimization hashbrown never tried.

Strengths

•Flat arena layout with single allocation beats nested structs for cache locality
•SIMD control-byte scans and 7-bit fingerprints mirror hashbrown's proven techniques
•Python bindings extend reach beyond Rust ecosystem for broader testing

Weaknesses

•Hash maps are solved—hashbrown and std::HashMap already dominate production use
•Six stars suggests narrow appeal beyond systems programming specialists

Post Description

I first came across the paper “Optimal Bounds for Open Addressing Without Reordering” through a Quanta Magazine YouTube video a few months back. I went looking for an official implementation and couldn’t find one, so I decided to try implementing the paper's Elastic Hashing and Funnel Hashing in Rust.

To that end, I build opthash, a Rust library providing ElasticHashMap and FunnelHashMap implementations (and more recently HashSet variants). They are at API parity with std::collections::HashMap and HashSet.

Initially, the data layout was a relatively straightforward implementation of the paper, with levels and buckets represented as nested structs that each managed their own allocation. Later, I moved to a flatter arena layout with one allocation for the backing control/data regions, and smaller descriptor structs holding pointers and metadata for each level. That ended up being noticeably faster. The intuition I went for is to arrange control bytes contiguously to maximize cache locality, since majority of instructions for probing was done on the control bytes.

Most the low-level details is inspired by hashbrown/SwissTable, such as 7-bit control bytes, SIMD scans over groups of control bytes, power-of-two sizing, test cases, and foldhash as the default hasher. hashbrown is included in the benchmarks as the performance ceiling, and std::HashMap is the baseline.

I also added Python bindings as a learning exercise. I aimed for parity with Python dict and set, but I quickly realized crossing through PyO3 adds more overhead than expected...

I would gladly appreciate feedback, especially about hash table construction, PyO3 bindings, or benchmarking methodology. I have learned a lot about Rust's language features (and crossing into unsafe territory) from this project, and I'm sure there are still many things to improve.