Back to browse
GitHub Repository

High-performance, declarative stream processor for delimited text data.

22 starsRust

Grab – A declarative stream processor for delimited text data

by anwitars·Mar 13, 2026·2 points·0 comments

AI Analysis

●●SolidCozySolve My Problem

Declarative schema mapping replaces cryptic awk $1 column indexing with named fields.

Strengths
  • Processes 17.1M fields per second with single static binary under 800KB.
  • Named field mapping syntax like phones:2 creates arrays without regex gymnastics.
  • Strict UTF-8 validation and schema enforcement catches malformed data early.
Weaknesses
  • Text processing space already has miller, xsv, and csvkit serving similar needs.
  • Learning curve for mapping syntax may not justify switching from familiar awk patterns.
Target Audience

DevOps engineers and data pipeline builders

Similar To

miller · xsv · csvkit

Post Description

I built grab because I wanted a more readable way to handle delimited data (CSV, TSV, whitespace) than the usual mix of awk and cut. It replaces cryptic 'schema' like $11 with a declarative one.

The goal was to build something with just enough features to serve a better UX while staying fast enough to stay out of the way.

ps aux | grab -d whitespace -m user,pid,cpu,mem,_:4,start,time,command:gj --json --skip 1 # Result: # {"user":"root","pid":"1","cpu":"0.0","mem":"0.0","start":"Mar10","time":"0:03","command":"/sbin/init"} # ...

Key points:

- Readable mappings: Define schemas like <name> to map columns to fields, with support for <name>:N for aggregating multiple columns into arrays, <name>:g for greedy mapping (into arrays), _ to skip columns and even <name>:Nj to join multiple columns into a single field.

- Performance: Zero-copy tokenization in Rust. It processes ~17.1M fields/sec. On my machine, it shapes a ps aux dump into JSON in about 13ms.

- Schema enforcement: By default, it tells you exactly which line failed and why on stderr, rather than silently producing garbled data.

- Small & static: Single 800KB binary, zero dependencies.

I’m sharing it now to see if this "schema-first" approach to shell piping feels as useful to others as it has been for me.

Repo: https://github.com/anwitars/grab

Crates.io: https://crates.io/crates/grab-cli

Similar Projects

AI/MLMid

A header-only C++ benchmark for predictive models on raw binary streams

This is a compact, dependency-free TestBed<MyModel> harness that forces models to predict next-step bitset inputs with deterministic seeds — clever for reproducible, low-level experimentation. Execution is pragmatic (header-only, quick compile, clear API), but there's no showcased model that actually passes the tests and the scope is deliberately narrow, so it’s more of a useful lab tool than a breakthrough benchmark.

Niche GemWizardry
MatejSprogar
114mo ago