WhiskeySour – A 10x faster drop-in replacement for BeautifulSoup

Name: WhiskeySour – A 10x faster drop-in replacement for BeautifulSoup
Availability: InStock
Author: ayas_behera

by ayas_behera·Apr 25, 2026·8 points·1 comment

Visit Project View on HN

AI Analysis

●●●BangerWizardrySolve My Problem

Rust-powered BeautifulSoup with 10x speed and full API compatibility.

Strengths

•Drop-in replacement requires only changing the import line in existing code.
•Arena allocation reduces memory from 500 bytes to 40 bytes per node.

Weaknesses

•Rust dependency adds build complexity for pure Python environments.
•Needs real-world production testing beyond the 450 passing tests.

Post Description

The Problem

I’ve been using BeautifulSoup for sometime. It’s the standard for ease-of-use in Python scraping, but it almost always becomes the performance bottleneck when processing large-scale datasets.

Parsing complex or massive HTML trees in Python typically suffers from high memory allocation costs and the overhead of the Python object model during tree traversal. In my production scraping workloads, the parser was consuming more CPU cycles than the network I/O. Lxml is fast but again uses up a lot of memory when processing large documents and has can cause trouble with malformed HTML.

The Solution

I wanted to keep the API compatibility that makes BS4 great, but eliminates the overhead that slows down high-volume pipelines. It also uses html5ever which That’s why I built WhiskeySour. And yes… I *vibe coded the whole thing*.

WhiskeySour is a drop-in replacement. You should be able to swap from "bs4 import BeautifulSoup" with "from whiskeysour import WhiskeySour" and see immediate speedups. Your workflows that used to take more than 30 mins might take less than 5 mins now.

I have shared the detailed architecture of the library here: https://the-pro.github.io/whiskeySour/architecture/

Here is the benchmark report against bs4 with html.parser: https://the-pro.github.io/whiskeySour/bench-report/

Here is the link to the repo: https://github.com/the-pro/WhiskeySour

Why I’m sharing this

I’m looking for feedback from the community on two fronts:

1. Edge cases: If you have particularly messy or malformed HTML that BS4 handles well, I’d love to know if WhiskeySour encounters any regressions.

2. Benchmarks: If you are running high-volume parsers, I’d appreciate it if you could run a test on your own datasets and share the results.

Similar Projects

Infrastructure●●●Banger

Lux – Drop-in Redis replacement in Rust. 5.6x faster, ~1MB Docker image

1MB Docker image beats Redis 30MB while hitting 10M ops/sec.

WizardrySlick

mattyhogan

59302mo ago

Developer Tools●●●Banger

Zmod codemod toolkit, 8x faster

jscodeshift drop-in replacement, 8x faster on real monorepos—API compatibility is the moat.

WizardryShip It

oss-luke

203mo ago

Open Source●●●Banger

Kindling – reverse-engineered kindlegen, ~7,000x faster in Rust

Reverse-engineered undocumented MOBI format — builds dictionaries in 6 seconds vs 12 hours.

WizardryDark HorseZero to One

ciscoriordan

212mo ago

AI/ML●●Solid

Fast-Axolotl – Rust extensions that make Axolotl fine-tuning 77x faster

77x faster data loading but only helps if you're already using Axolotl specifically.

Niche GemShip It

ticktockten

103mo ago

Infrastructure●●●Banger

A faster, drop-in replacement for Tailscale's DERP relay

Half the cores as Tailscale's derper using io_uring and kernel TLS offload.

WizardryBig Brain

KRuskowski

421mo ago

Developer Tools●●Solid

Cj – jc rewritten in Rust, 230 parsers, 10x faster

22x faster startup than jc with full parser compatibility and zero Python dependencies.

SlickSolve My Problem

zhongwei2049

402mo ago