Back to browse

QLoRA fine-tuning in .zse INT4 format by ZSE

by zyoralabs·Mar 4, 2026·1 point·0 comments

AI Analysis

●●SolidWizardrySolve My Problem

Train 72B models on A100-40GB via INT4 quantization, but GPTQ and bitsandbytes already exist.

Strengths
  • Verified end-to-end benchmarks (H200, Qwen) with actual VRAM/speed numbers, no hand-waving.
  • Enables 7B training on RTX 3070 (8GB) and 70B on dual 3090—genuine accessibility gain.
  • Clean Python API with minimal boilerplate (LoRA adapter ~25MB, 0.2% params trainable).
Weaknesses
  • Proprietary .zse format is a walled garden; ecosystem already converges on safetensors + GPTQ.
  • No evidence this outperforms bitsandbytes + AutoGPTQ in quality or speed—only claims feature parity.
Target Audience

ML engineers, researchers fine-tuning LLMs on consumer hardware

Similar To

bitsandbytes · AutoGPTQ · GPTQ

Post Description

Released v1.4.0 of ZSE with QLoRA fine-tuning support for INT4 models.

Verified benchmarks (H200 GPU, Qwen models):

Model File Size VRAM (Inference) VRAM (+ Training) Speed 7B 5.57 GB 5.67 GB ~8 GB 37.2 tok/s 14B 9.95 GB 10.08 GB ~14 GB 20.8 tok/s 32B 19.23 GB 19.47 GB ~26 GB 10.9 tok/s 72B 41.21 GB 41.54 GB ~52 GB 6.3 tok/s What this means:

Train 7B models on RTX 3070/4070 (8GB) Train 32B models on RTX 3090/4090 (24GB) Train 70B models on A100-40GB or 2x 3090

Usage: from zse.format import load_zse_model from zse.training import LoRAConfig, add_lora_to_model

model, tokenizer, info = load_zse_model("model.zse") model = add_lora_to_model(model, LoRAConfig(rank=16, alpha=32))

# Train normally, adapter is ~25MB save_lora_adapter(model, "my_adapter.safetensors") Trainable params: 0.2% of model (12M params for 7B)

pip install zllm-zse[training]

Code: github.com/zyora-ai/zse

Similar Projects

AI/MLMid

100% LLM accuracy–no fine-tuning, JSON only

Ancient Rome Q&A benchmark shows 81pp accuracy lift, but lacks adversarial defense evidence.

Big Brain
MysticBirdie
223mo ago
Developer Tools●●●Banger

GEKO (up to 80% compute savings on LLM fine-tuning)

Mountain Curriculum routing: 5× compute to hard samples, skip mastered ones.

Big BrainWizardryShip It
SyedAbdurR2hman
113mo ago