Back to browse
Apex-1-flash, 4B LLM finetuned on RTX 5070

Apex-1-flash, 4B LLM finetuned on RTX 5070

by Qmay_Dev·Jun 27, 2026·2 points·0 comments

AI Analysis

MidShip It

Another 4B Qwen fine-tune without benchmarks showing it beats existing models.

Strengths
  • Consumer hardware training demonstrates accessibility with Unsloth
  • Standard integrations with Transformers and llama.cpp
  • Open weights on Hugging Face with GGUF quantization
Weaknesses
  • No benchmarks comparing performance to other 4B reasoning models
  • Thousands of similar fine-tuned models already exist on Hugging Face
Category
Target Audience

Developers running local LLMs

Similar To

Qwen3-4B-Instruct · TinyLlama · Phi-3-mini

Post Description

The goal was to create a highly efficient, small-scale model that can perform reasoning tasks while remaining lightweight enough to run easily on consumer hardware. Technical Stack: Base: Qwen3:4B Training: Fine-tuned using Unsloth for memory efficiency, which allowed me to run the process smoothly on an RTX 5070. Stack: Built with cu128, PyTorch, and Hugging Face Transformers. Dataset: Trained on Raymond-dev-546730/Open-CoT-Reasoning-Mini to improve Chain-of-Thought (CoT) capabilities.

Similar Projects