GitHub Repository

⚡ Production-grade RAG chunking engine powered by Rust. Process GBs of CSV, PDF, JSON, JSONL, DOCX, XLSX, URLs, ETC., in seconds with O(1) memory. 40x faster than LangChain.

48 starsPython

Rust-powered document chunker for RAG – 40x faster, O(1) memory

Name: Rust-powered document chunker for RAG – 40x faster, O(1) memory
Availability: InStock
Author: kriralabs

by kriralabs·Feb 28, 2026·14 points·3 comments

Visit Project View on HN

AI Analysis

●●SolidWizardryShip It

Rust core beats LangChain's Python bottleneck, but chunking alone won't move the needle.

Strengths

•Genuine performance win: 40x faster + O(1) memory is measurable, not marketing-speak, with credible benchmarks (42M chunks in 113s)
•Drop-in Python API means adopting it requires changing one import line, not rewriting pipelines
•Handles multiple formats (CSV, PDF, JSON, DOCX, XLSX, URLs) — wider input scope than most chunkers

Weaknesses

•Chunking is table stakes in RAG pipelines; speed matters but doesn't fix downstream LLM hallucination
•No comparison to other fast chunkers (Llamaindex, Unstructured.io) or production stability data beyond 'shipped 17 versions'

Post Description

I built a document chunking library for RAG pipelines with a Rust core and Python bindings.

The problem: LangChain's chunker is pure Python and becomes a bottleneck at scale — slow and memory-hungry on large document sets.

What Krira Chunker does differently: - Rust-native processing — 40x faster than LangChain's implementation - O(1) space complexity — memory stays flat regardless of document size - Drop-in Python API — works with any existing RAG pipeline - Production-ready — 17 versions shipped, 315+ installs

pip install krira-augment

Would love brutal feedback from anyone building RAG systems — what chunking problems are you running into that this doesn't solve yet?

Similar Projects

AI/ML●●Solid

RAG chunking playground: visualize how your docs get split

Visual chunking comparison beats guessing — export production-ready code.

Solve My ProblemNiche Gem

Horatius77

101mo ago

AI/ML●●Solid

A tool to create and evaluate document processing pipelines for RAG

LLM-as-judge metrics beat guessing chunk sizes, but Ragas and LangSmith already exist.

Solve My ProblemSlick

martimchaves

202mo ago

Data●●●Banger

NRC nuclear licensing RAG pipeline and regulatory embeddings dataset

First public NRC regulatory embeddings dataset—37K chunks ready for ChromaDB and Pinecone.

Niche GemSolve My Problem

davenporten

202mo ago

AI/ML●●Solid

Wax – RAG in a single file (SQLite for AI memory)

Exports a one-file 'brain' and a tiny MemoryOrchestrator API (remember/recall) so you can ditch Docker and hosted vector DBs — token-budgeted, deterministic recall and kill-9-safe durability are concrete wins. The Metal-accelerated vector search plus SQLite FTS5 fallback shows real engineering heft, but it's clearly tuned for the Apple ecosystem and the author is still asking for retrieval/eval feedback.

WizardryNiche Gem

karc14

204mo ago

Developer Tools●●Solid

LiteParse v2, now in Rust 100x faster

Rust rewrite with PDFium delivers 100x speedup over the Python v1.

SlickSolve My Problem

pierre

15016d ago

Developer Tools●●●Banger

Warp_cache – SIEVE cache in Rust for Python, 25x faster than cachetools

SIEVE cache beats LRU with one-line swap, but only matters if you're bottlenecked on cache.

WizardryBig Brain

tolopalmer

203mo ago