Back to browse
Parsing hostile industrial data in 64MB WASM sandboxes

Parsing hostile industrial data in 64MB WASM sandboxes

by bneb-dev·Mar 18, 2026·2 points·0 comments

AI Analysis

●●●BangerWizardryBig BrainNiche Gem

WASM sandboxes per parser replace $200K legacy gateways at $0.05/MB.

Strengths
  • Wasmtime fuel metering kills infinite loops before they crash production systems.
  • Hot-swappable .wasm drivers deploy without maintenance windows or restarts.
  • Arrow IPC output enables zero-copy ingestion into Spark and DuckDB pipelines.
Weaknesses
  • Niche audience limits adoption to teams with industrial protocol requirements.
  • Production claims need verification in real OT environments with hostile data.
Target Audience

Industrial IoT teams, data engineers handling legacy protocol integration

Similar To

Kepware · Matrikon · Softing

Post Description

Hey all, back again.

Ingelt is a Rust/Axum gateway that compiles 33 different legacy protocols (Modbus, BACnet, X12, SECS/GEM, etc.) into isolated WebAssembly sandboxes.

I have some background in data engineering and in the past, it was so frustrating seeing data pipelines fail due to some quirk on a single row, special character, badly escaped quote, etc... to bring down an entire pipeline.

I've been thinking about the problem space and how it abstracts nicely to a common API surface. I understand that industrial data pipelines similarly panic because a legacy C/C++ Modbus or EDIFACT parser hit a malformed hex payload and segfaulted. Industrial and supply chain data is basically hostile by default.

The architecture is pretty straightforward:

Strict Isolation: Every parse request gets its own pooled Wasmtime instance. If a parser hits an infinite loop, Wasmtime fuel metering kills it. If a malformed payload tries to allocate too much memory, a 64MB StoreLimitsBuilder ceiling traps it. The host never panics.

Streaming ABI: I recently added a chunked streaming ABI (parse_init -> parse_chunk -> parse_finalize). It can now ingest a 500MB maritime EDIFACT file with a flat ~1MB memory footprint inside the guest.

Output: It spits out clean JSON, or Apache Arrow IPC if you want to pipe it directly into a data warehouse.

It is definitely a trade-off. Crossing the Wasm boundary and doing JSON serialization inside the guest adds a few microseconds of latency compared to a raw native parser, but many parsers are pretty slow. I think the trade-off makes sense.

I'd appreciate any thoughts on the Wasm isolation model, the streaming ABI approach, or if you think there's a better way to handle the host-guest memory bridge. I'm coming at this as an outsider to industrial protocols and the painpoints for using them, and I'm trying to learn quickly.

Thanks

Similar Projects