Back to browse
GitHub Repository

Optimal open-addressing hash maps (Elastic Hashing & Funnel Hashing) in Rust, with Python bindings.

7 starsRust

Opthash – Rust implementations of Elastic and Funnel hashing

by aayd·Jun 2, 2026·4 points·0 comments

AI Analysis

●●SolidNiche GemBig Brain

Implements 2025 hashing research with arena layout optimization hashbrown never tried.

Strengths
  • Flat arena layout with single allocation beats nested structs for cache locality
  • SIMD control-byte scans and 7-bit fingerprints mirror hashbrown's proven techniques
  • Python bindings extend reach beyond Rust ecosystem for broader testing
Weaknesses
  • Hash maps are solved—hashbrown and std::HashMap already dominate production use
  • Six stars suggests narrow appeal beyond systems programming specialists
Target Audience

Systems programmers, Rust developers working on performance-critical data structures

Similar To

hashbrown · std::collections::HashMap · swisstable

Post Description

I first came across the paper “Optimal Bounds for Open Addressing Without Reordering” through a Quanta Magazine YouTube video a few months back. I went looking for an official implementation and couldn’t find one, so I decided to try implementing the paper's Elastic Hashing and Funnel Hashing in Rust.

To that end, I build opthash, a Rust library providing ElasticHashMap and FunnelHashMap implementations (and more recently HashSet variants). They are at API parity with std::collections::HashMap and HashSet.

Initially, the data layout was a relatively straightforward implementation of the paper, with levels and buckets represented as nested structs that each managed their own allocation. Later, I moved to a flatter arena layout with one allocation for the backing control/data regions, and smaller descriptor structs holding pointers and metadata for each level. That ended up being noticeably faster. The intuition I went for is to arrange control bytes contiguously to maximize cache locality, since majority of instructions for probing was done on the control bytes.

Most the low-level details is inspired by hashbrown/SwissTable, such as 7-bit control bytes, SIMD scans over groups of control bytes, power-of-two sizing, test cases, and foldhash as the default hasher. hashbrown is included in the benchmarks as the performance ceiling, and std::HashMap is the baseline.

I also added Python bindings as a learning exercise. I aimed for parity with Python dict and set, but I quickly realized crossing through PyO3 adds more overhead than expected...

I would gladly appreciate feedback, especially about hash table construction, PyO3 bindings, or benchmarking methodology. I have learned a lot about Rust's language features (and crossing into unsafe territory) from this project, and I'm sure there are still many things to improve.

Similar Projects