Back to browse
GitHub Repository

⚡ Production-grade RAG chunking engine powered by Rust. Process GBs of CSV, PDF, JSON, JSONL, DOCX, XLSX, URLs, ETC., in seconds with O(1) memory. 40x faster than LangChain.

48 starsPython

Rust-powered document chunker for RAG – 40x faster, O(1) memory

by kriralabs·Feb 28, 2026·14 points·3 comments

AI Analysis

●●SolidWizardryShip It

Rust core beats LangChain's Python bottleneck, but chunking alone won't move the needle.

Strengths
  • Genuine performance win: 40x faster + O(1) memory is measurable, not marketing-speak, with credible benchmarks (42M chunks in 113s)
  • Drop-in Python API means adopting it requires changing one import line, not rewriting pipelines
  • Handles multiple formats (CSV, PDF, JSON, DOCX, XLSX, URLs) — wider input scope than most chunkers
Weaknesses
  • Chunking is table stakes in RAG pipelines; speed matters but doesn't fix downstream LLM hallucination
  • No comparison to other fast chunkers (Llamaindex, Unstructured.io) or production stability data beyond 'shipped 17 versions'
Target Audience

RAG/LLM engineers, ML ops teams, vector database users

Similar To

LangChain RecursiveCharacterTextSplitter · Llamaindex SimpleNodeParser · Unstructured.io

Post Description

I built a document chunking library for RAG pipelines with a Rust core and Python bindings.

The problem: LangChain's chunker is pure Python and becomes a bottleneck at scale — slow and memory-hungry on large document sets.

What Krira Chunker does differently: - Rust-native processing — 40x faster than LangChain's implementation - O(1) space complexity — memory stays flat regardless of document size - Drop-in Python API — works with any existing RAG pipeline - Production-ready — 17 versions shipped, 315+ installs

pip install krira-augment

Would love brutal feedback from anyone building RAG systems — what chunking problems are you running into that this doesn't solve yet?

Similar Projects

AI/ML●●Solid

Wax – RAG in a single file (SQLite for AI memory)

Exports a one-file 'brain' and a tiny MemoryOrchestrator API (remember/recall) so you can ditch Docker and hosted vector DBs — token-budgeted, deterministic recall and kill-9-safe durability are concrete wins. The Metal-accelerated vector search plus SQLite FTS5 fallback shows real engineering heft, but it's clearly tuned for the Apple ecosystem and the author is still asking for retrieval/eval feedback.

WizardryNiche Gem
karc14
204mo ago