BitVanes – A zero-trust RAG pipeline engine in Rust, WASM, and Arrow

Name: BitVanes – A zero-trust RAG pipeline engine in Rust, WASM, and Arrow
Availability: InStock
Author: kodr_pro

by kodr_pro·Jun 24, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●●BangerZero to OneBig BrainWizardry

Runs entire RAG preprocessing in-browser via WASM — no data ever leaves your machine.

Strengths

•Zero-trust architecture means sensitive documents never touch a server during parsing or chunking.
•Apache Arrow zero-copy output avoids JSON serialization overhead on the data path.
•Six OpenAI tokenizers with structural-boundary-aware splitting respects headings and code blocks.

Weaknesses

•First-time Arrow implementation may have edge cases compared to mature parsing libraries.
•PDF.js dependency adds WASM bundle size for PDF-heavy workflows.

Post Description

Most RAG pipelines ship raw, sensitive documents over the wire to cloud services just to get them parsed, scrubbed of PII, chunked, and vectorized. BitVanes is a zero-trust, local-first ETL engine designed to solve this. It’s written in Rust, spits out Apache Arrow RecordBatches, and compiles to both a native CLI and WebAssembly so you can run the entire pipeline directly in a browser sandbox. I've got the wasm version at the posted url. Core and cli are on github.

I'd love to get your thoughts on the architecture, particularly around using Arrow (it's my first time using AA, I'm coming from capnp), or the Rust-to-JS design for pdfs to keep the wasm package reasonable.

I'd like to crates the package once I've had some people kick the tires and I get it ironed out.