Apex-1-flash, 4B LLM finetuned on RTX 5070

Name: Apex-1-flash, 4B LLM finetuned on RTX 5070
Availability: InStock
Author: Qmay_Dev

by Qmay_Dev·Jun 27, 2026·2 points·0 comments

Visit Project View on HN

AI Analysis

●MidShip It

Another 4B Qwen fine-tune without benchmarks showing it beats existing models.

Strengths

•Consumer hardware training demonstrates accessibility with Unsloth
•Standard integrations with Transformers and llama.cpp
•Open weights on Hugging Face with GGUF quantization

Weaknesses

•No benchmarks comparing performance to other 4B reasoning models
•Thousands of similar fine-tuned models already exist on Hugging Face

Post Description

The goal was to create a highly efficient, small-scale model that can perform reasoning tasks while remaining lightweight enough to run easily on consumer hardware. Technical Stack: Base: Qwen3:4B Training: Fine-tuned using Unsloth for memory efficiency, which allowed me to run the process smoothly on an RTX 5070. Stack: Built with cu128, PyTorch, and Hugging Face Transformers. Dataset: Trained on Raymond-dev-546730/Open-CoT-Reasoning-Mini to improve Chain-of-Thought (CoT) capabilities.