QLoRA fine-tuning in .zse INT4 format by ZSE

Name: QLoRA fine-tuning in .zse INT4 format by ZSE
Author: zyoralabs

by zyoralabs·Mar 4, 2026·1 point·0 comments

View on HN

AI Analysis

●●SolidWizardrySolve My Problem

Train 72B models on A100-40GB via INT4 quantization, but GPTQ and bitsandbytes already exist.

Strengths

•Verified end-to-end benchmarks (H200, Qwen) with actual VRAM/speed numbers, no hand-waving.
•Enables 7B training on RTX 3070 (8GB) and 70B on dual 3090—genuine accessibility gain.
•Clean Python API with minimal boilerplate (LoRA adapter ~25MB, 0.2% params trainable).

Weaknesses

•Proprietary .zse format is a walled garden; ecosystem already converges on safetensors + GPTQ.
•No evidence this outperforms bitsandbytes + AutoGPTQ in quality or speed—only claims feature parity.

Post Description

Released v1.4.0 of ZSE with QLoRA fine-tuning support for INT4 models.

Verified benchmarks (H200 GPU, Qwen models):

Model File Size VRAM (Inference) VRAM (+ Training) Speed 7B 5.57 GB 5.67 GB ~8 GB 37.2 tok/s 14B 9.95 GB 10.08 GB ~14 GB 20.8 tok/s 32B 19.23 GB 19.47 GB ~26 GB 10.9 tok/s 72B 41.21 GB 41.54 GB ~52 GB 6.3 tok/s What this means:

Train 7B models on RTX 3070/4070 (8GB) Train 32B models on RTX 3090/4090 (24GB) Train 70B models on A100-40GB or 2x 3090

Usage: from zse.format import load_zse_model from zse.training import LoRAConfig, add_lora_to_model

model, tokenizer, info = load_zse_model("model.zse") model = add_lora_to_model(model, LoRAConfig(rank=16, alpha=32))

# Train normally, adapter is ~25MB save_lora_adapter(model, "my_adapter.safetensors") Trainable params: 0.2% of model (12M params for 7B)

pip install zllm-zse[training]

Code: github.com/zyora-ai/zse

Similar Projects

Infrastructure●●Solid

ZSE – Single-file LLM engine with dual INT4 kernels

INT4 inference engine beats llama.cpp on VRAM, but competing against established tools.

WizardryShip It

zyoralabs

104mo ago

AI/ML●●●Banger

LLMForge – Orchestrate your LLM pipeline. Locally

Full LLM pipeline in one window when LM Studio only does inference.

SlickZero to OneSolve My Problem

gokulnair2001

401mo ago

AI/ML●Mid

An LLM that's better at writing

Novel fine-tuning algorithm for writing, but the demo model is too small to prove the concept.

Bold Bet

rosmine

402mo ago

Developer Tools●●Solid

M-Courtyard – Fine-tune LLMs on your Mac with zero code

Wraps mlx-lm fine-tuning into a guided desktop UI, but local LLM tools are crowded.

Niche GemShip It

tuwenbo0120

115mo ago

AI/ML●●Solid

NeuralForge – Fine-Tune LLMs on Your Mac Using Apple Neural Engine

Fine-tune LLMs on Apple Neural Engine using reverse-engineered private frameworks — genuinely novel approach.

WizardryBig BrainNiche Gem

khaeldur

114mo ago

AI/ML●Mid

100% LLM accuracy–no fine-tuning, JSON only

Ancient Rome Q&A benchmark shows 81pp accuracy lift, but lacks adversarial defense evidence.

Big Brain

MysticBirdie

224mo ago