KV-psi, using Linux PSI to to trim an LLM KV cache

Name: KV-psi, using Linux PSI to to trim an LLM KV cache
Availability: InStock
Author: infiniteregrets

by infiniteregrets·Jun 27, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●●SolidBig BrainBold Bet

Using Linux PSI signals to prune KV cache is genuinely clever for edge inference.

Strengths

•Novel application of PSI memory pressure signals to LLM runtime cache management
•Benchmark table shows PSI variant reducing KV from 1547 to 1004 tokens under pressure
•Targets unified memory constraints on Jetson Orin where swap behavior matters

Weaknesses

•Author admits limited benchmarking and experimental vibed approach
•llama.cpp integration is reference implementation, not production-ready library

Post Description

I thought it'd be interesting to use Linux PSI (Pressure Stall Information) for an LLM runtime to trim the KV cache. This is mainly useful imo for edge devices like the Jetson Orin super nano kit which have unified memory. I haven't benched much, but plan to do so more over time and see if I can make a real use of it as I run local LLMs. Let me know if it makes sense :P (I of course vibed this idea)

Similar Projects

Developer Tools●●Solid

A simple CLI to stop copying repos for LLM reference

Keeps a single cached store of repos and gives you an interactive CLI to link them into .llm/reference so you can check a small dotllm.json into projects and run dotllm sync on fresh clones. It’s a pragmatic, low-fuss alternative to submodules or ad-hoc scripts — nicely opinionated — but the project could use clearer docs around cross-platform behaviour, conflict resolution, and scalability.

Niche GemSolve My Problem

dboon

144mo ago

AI/ML●Mid