Back to browse
GitHub Repository
0 starsPython

KV-psi, using Linux PSI to to trim an LLM KV cache

by infiniteregrets·Jun 27, 2026·2 points·0 comments

AI Analysis

●●SolidBig BrainBold Bet

Using Linux PSI signals to prune KV cache is genuinely clever for edge inference.

Strengths
  • Novel application of PSI memory pressure signals to LLM runtime cache management
  • Benchmark table shows PSI variant reducing KV from 1547 to 1004 tokens under pressure
  • Targets unified memory constraints on Jetson Orin where swap behavior matters
Weaknesses
  • Author admits limited benchmarking and experimental vibed approach
  • llama.cpp integration is reference implementation, not production-ready library
Category
Target Audience

Edge AI developers running local LLMs on memory-constrained hardware like Jetson

Similar To

vLLM · llama.cpp · MLC LLM

Post Description

I thought it'd be interesting to use Linux PSI (Pressure Stall Information) for an LLM runtime to trim the KV cache. This is mainly useful imo for edge devices like the Jetson Orin super nano kit which have unified memory. I haven't benched much, but plan to do so more over time and see if I can make a real use of it as I run local LLMs. Let me know if it makes sense :P (I of course vibed this idea)

Similar Projects

Developer Tools●●Solid

A simple CLI to stop copying repos for LLM reference

Keeps a single cached store of repos and gives you an interactive CLI to link them into .llm/reference so you can check a small dotllm.json into projects and run dotllm sync on fresh clones. It’s a pragmatic, low-fuss alternative to submodules or ad-hoc scripts — nicely opinionated — but the project could use clearer docs around cross-platform behaviour, conflict resolution, and scalability.

Niche GemSolve My Problem
dboon
144mo ago