GitHub Repository

SQLite-based LLM Inference Framework for Every Device

8 starsC

Llm.sql – Run a 640MB LLM on SQLite, with 210MB peak RSS and 7.4 tok/s

Name: Llm.sql – Run a 640MB LLM on SQLite, with 210MB peak RSS and 7.4 tok/s
Availability: InStock
Author: aldielshala

by aldielshala·Apr 24, 2026·8 points·2 comments

Visit Project View on HN

AI Analysis

●●●BangerWizardryBig BrainNiche Gem

SQLite-based LLM inference hitting 210MB RSS beats OS paging with deterministic memory control.

Strengths

•Explicit memory management replaces OS paging with Bélády-optimal page replacement
•Zero PyTorch or Transformers dependencies — just Python, C, and SQLite
•Measured 7.4 tok/s on Qwen2.5-0.5B with concrete RSS benchmarks

Weaknesses

•Only demonstrated on 0.5B models — larger model performance remains unproven
•SQLite C extensions require compilation, limiting true plug-and-play deployment

Post Description

Hi HN,

I built llm.sql, an LLM inference framework that reimagines the LLM execution pipeline as a series of structured SQL queries atop SQLite.

The motivation: Edge LLMs are getting better, but hardware remains a bottleneck, especially RAM (size and bandwidth).

When available memory is less than the model size and KV cache, the OS incurs page faults and swaps pages using LRU-like strategies, resulting in throughput degradation that's hard to notice and even harder to debug. In fact, the memory access pattern during LLM inference is deterministic - we know exactly which weights are needed and when. This means even Bélády's optimal page replacement algorithm is applicable here.

So instead of letting the OS manage memory, llm.sql takes over:

- Model parameters are stored in SQLite BLOB tables

- Computational logic is implemented as SQLite C extensions

- Memory management is handled explicitly, not by the OS

- Zero heavy dependencies. No PyTorch, no Transformers. Just Python, C, or C++

This gives us explicit, deterministic control over what's in memory at each step of inference.

Results:

Running Qwen2.5-0.5B-INT8 (~640MB model) with a peak RSS ~210MB and 7.40 tokens/s throughput.

Alpha version is available on GitHub: https://github.com/xuxianghong12/llm.sql

I'm the developer, happy to answer any technical questions about the design and implementation.

Similar Projects

Developer Tools●●Solid

Slopsome – a VRAM fit calculator and tok/s database for local LLMs

VRAM calculator with crowd-sourced tok/s benchmarks when model cards already exist.

Niche GemSolve My Problem

NexAIGuy

305d ago

AI/ML●●Solid

Eatmydata.ai – Local-First Question-to-SQL-to-Dashboard AI

In-browser SQLite with LLM sanitization when chat-with-data tools already exist.

Big BrainNiche Gem

dennis16384

829d ago

Developer Tools●●●Banger

SQL-pipe – Query CSV streams with SQLite syntax (written in Zig)

SQL queries on CSV streams—instant, zero-setup alternative to awk and sqlite3 boilerplate.

Ship ItSolve My ProblemSlick

vmvarela

143mo ago

Developer Tools●●●Banger

Antenna – RSS reader with a built-in MCP server

Local-first RSS reader with built-in MCP server for agent-accessible subscription graphs.

Zero to OneSolve My ProblemNiche Gem

toddllm

1761mo ago

Developer Tools●●Solid

Memimpact – memory footprint CLI written in Rust

Cleaner alternative to /usr/bin/time for quick memory profiling, no recompilation required.

Solve My ProblemShip It

goldenarm

203mo ago

Developer Tools●●Solid

Testing SQL logic without a real database

Replaces slow TestContainers with dialect-specific in-memory SQL; fills real testing pain.

Solve My ProblemNiche Gem

chrisulson

113mo ago