Back to browse
GitHub Repository

Wrap Python functions and shell commands as content-addressed transformations. Cache results, run them locally or on a cluster, and share them by checksum.

23 starsShell

Seamless – Content-addressed computation caching for Python and bash

by sjdv1982·Apr 17, 2026·1 point·0 comments

AI Analysis

●●●BangerBig BrainWizardry

Checksum-based computation identity beats Make and DVC for reproducible pipelines.

Strengths
  • Content-addressed identity means identical computations auto-deduplicate across machines
  • SQLite database makes sharing cached results as simple as copying a file
  • Wraps both Python functions and shell commands without code changes
Weaknesses
  • Alpha stage — interactive features from 0.x still being ported to new architecture
  • 92 open issues suggests rough edges remain for production use
Target Audience

Data scientists and research engineers building reproducible pipelines

Similar To

Nix · DVC · Make

Post Description

Hey HN, Sjoerd de Vries here. I have worked on Seamless for nearly 10 years now. It has been used in my lab, but I was always around for troubleshooting. This is the first time that I think it's ready to stand on its own. I would love to hear your thoughts about it.

It started as a hobby project — I had an itch about programming not being at-your-fingertips enough. Then I applied it to my work as a bioinformatics research engineer. The early versions focused on interactive workflows. After a year or two I realized that to make interactivity work properly, you need really good DAG tracking, so checksums were added everywhere. My lab built a collaborative web server with it that we published. More recently I've rebuilt it around the command line, persistent caching, and remote deployment.

It's still in alpha, but the core is usable.

Core idea: same code + same inputs = same result, identified by checksum. If you've already computed it, you don't compute it again.

Two entry points:

Python:

from seamless.transformer import direct

@direct def add(a, b): import time time.sleep(5) return a + b

add(2, 3) # runs, caches result add(2, 3) # cache hit — instant

Bash:

seamless-run 'seq 1 10 | tac && sleep 5' # runs, caches result seamless-run 'seq 1 10 | tac && sleep 5' # cache hit — instant

With persistent caching enabled, results are stored as checksum-to-checksum mappings in a small SQLite database that can be shared with collaborators, so that they get cache hits too.

Execution scales by changing config, not code: in-process, spawned workers, or a Dask-backed HPC cluster.

Remote execution also doubles as a reproducibility test. If your code produces the same result on a clean worker, it's reproducible. If not, Seamless helped you find the problem — whether it's a missing dependency, an undeclared input, or a platform sensitivity.

Built for scientific computing and data pipelines, but works for anything pipeline-shaped.

Similar Projects

AI/ML●●Solid

Save Context from MCP Bloat

Caches bloated MCP responses and lets agents query with jq, saving real tokens.

Big BrainSolve My Problem
healqq
2017d ago
AI/MLMid

Standalone TurboQuant KV Cache Inference

Standalone KV cache compression script implementing TurboQuant with 1.55x ratio.

Big BrainShip It
g023
342mo ago