GitHub Repository

Claude Code for local LLMs. Unified backend, setup, and coding harness for your own models.

45 starsPython

OpenJet – An offline agent harness for memory-constrained edge hardware

Name: OpenJet – An offline agent harness for memory-constrained edge hardware
Availability: InStock
Author: lforster

by lforster·Mar 15, 2026·1 point·0 comments

Visit Project View on HN

AI Analysis

●●●BangerNiche GemSolve My ProblemShip It

Context condensing under memory pressure solves the actual pain of edge AI agents.

Strengths

•Automatic context condensing prevents OOM crashes when 9B models eat all available RAM.
•Air-gapped mode with persistent memory files handles interrupted sessions gracefully.
•Supports llama.cpp, SGLang, and TensorRT-LLM backends with unified TUI interface.

Weaknesses

•Requires manual llama-server build — not as turnkey as cloud-hosted alternatives.
•Jetson-specific optimization limits appeal to broader local-LLM audience.

Post Description

Hi HN,

I am building a terminal UI for self-hosted AI agents on Jetsons and other edge devices with unified memory.

The reason I started it was that most local agent harnesses seems aimed at machines with plenty of RAM and a stable internet-connected developer environment. On Jetson-class hardware, the annoying problems are different: context growth eats memory, sessions break, models may fit but leave very little headroom, and a lot of tools assumes cloud access.

Recent additions include:

- air-gapped mode - automatic context condensing under memory pressure - persistent memory files and /memory controls - harness modes for chat/code/review/debug workflows - replayable traces for evals/debugging - multimodal local image input - OpenTelemetry support

I’d love for you to try it out. The code is up on GitHub, and contributions/roasts of my memory management are very welcome. On a 8GB, I got the latest Qwen3.5-9B running (it just about fits in the memory).

Contributions are welcome ofc. Github: https://github.com/L-Forster/open-jet