Unsloth Documentation

Unsloth Docs: Train your own model with Unsloth, an open-source framework for LLM fine-tuning and reinforcement learning.
Beginner? Start here!
Unsloth Requirements: Here are Unsloth's requirements including system and GPU VRAM requirements.
FAQ + Is Fine-tuning Right For Me?: If you're stuck on if fine-tuning is right for you, see here! Learn about fine-tuning misconceptions, how it compared to RAG and more:
Unsloth Notebooks: Explore our catalog of Unsloth notebooks:
All Our Models
Install & Update: Learn to install Unsloth locally or online.
Updating: To update or use an old version of Unsloth, follow the steps below:
Pip Install: To install Unsloth locally via Pip, follow the steps below:
Docker: Install Unsloth using our official Docker container
Windows Installation: See how to install Unsloth on Windows with or without WSL.
AMD: Fine-tune with Unsloth on AMD GPUs.
Conda Install: To install Unsloth locally on Conda, follow the steps below:
Google Colab: To install and run Unsloth on Google Colab, follow the steps below:
Fine-tuning LLMs Guide: Learn all the basics and best practices of fine-tuning. Beginner-friendly.
What Model Should I Use?
Datasets Guide: Learn how to create & prepare a dataset for fine-tuning.
LoRA Hyperparameters Guide: Optimal lora rank. alpha, number of epochs, batch size & gradient accumulation, QLoRA vs LoRA, target modules and more!
Tutorial: How to Finetune Llama-3 and Use In Ollama: Beginner's Guide for creating a customized personal assistant (like ChatGPT) to run locally on Ollama
Reinforcement Learning (RL) Guide: Learn all about Reinforcement Learning (RL) and how to train your own DeepSeek-R1 reasoning model with Unsloth using GRPO. A complete guide from beginner to advanced.
Tutorial: Train your own Reasoning model with GRPO: Beginner's Guide to transforming a model like Llama 3.1 (8B) into a reasoning model by using Unsloth and GRPO.
Advanced RL Documentation: Advanced documentation settings when using Unsloth with GRPO.
Memory Efficient RL
RL Reward Hacking: Learn what is Reward Hacking in Reinforcement Learning and how to counter it.
GSPO Reinforcement Learning: Train with GSPO (Group Sequence Policy Optimization) RL in Unsloth.
Reinforcement Learning - DPO, ORPO & KTO: To use the reward modelling functions for DPO, GRPO, ORPO or KTO with Unsloth, follow the steps below:
DeepSeek-OCR: How to Run & Fine-tune: Guide on how to run and fine-tune DeepSeek-OCR locally.
How to Fine-tune LLMs with Unsloth & Docker: Learn how to fine-tune LLMs or do Reinforcement Learning (RL) with Unsloth's Docker image.
Vision Reinforcement Learning (VLM RL): Train Vision/multimodal models via GRPO and RL with Unsloth!
gpt-oss Reinforcement Learning
Tutorial: How to Train gpt-oss with RL: Learn to train OpenAI gpt-oss with GRPO to autonomously beat 2048 locally or on Colab.
Unsloth Dynamic GGUFs on Aider Polyglot: Performance of Unsloth Dynamic GGUFs on Aider Polyglot Benchmarks
Qwen3-VL: How to Run & Fine-tune: Learn to fine-tune and run Qwen3-VL locally with Unsloth.
gpt-oss: How to Run & Fine-tune: Run & fine-tune OpenAI's new open-source models!
Tutorial: How to Fine-tune gpt-oss: Learn step-by-step how to train OpenAI gpt-oss locally with Unsloth.
Long Context gpt-oss Training
GLM-4.6: How to Run Locally: A guide on how to run Z.ai's new GLM-4.6 model on your own local device!
IBM Granite 4.0: How to run IBM Granite-4.0 with Unsloth GGUFs on llama.cpp, Ollama and how to fine-tune!
DeepSeek-V3.1: How to Run Locally: A guide on how to run DeepSeek-V3.1 and Terminus on your own local device!
Qwen3-Coder: How to Run Locally: Run Qwen3-Coder-30B-A3B-Instruct and 480B-A35B locally with Unsloth Dynamic quants.
Gemma 3: How to Run & Fine-tune: How to run Gemma 3 effectively with our GGUFs on llama.cpp, Ollama, Open WebUI and how to fine-tune with Unsloth!
Gemma 3n: How to Run & Fine-tune: Run Google's new Gemma 3n locally with Dynamic GGUFs on llama.cpp, Ollama, Open WebUI and fine-tune with Unsloth!
Qwen3: How to Run & Fine-tune: Learn to run & fine-tune Qwen3 locally with Unsloth + our Dynamic 2.0 quants
Qwen3-2507: Run Qwen3-30B-A3B-2507 and 235B-A22B Thinking and Instruct versions locally on your device!
Tutorials: How To Fine-tune & Run LLMs: Learn how to run and fine-tune models for optimal performance 100% locally with Unsloth.
DeepSeek-R1-0528: How to Run Locally: A guide on how to run DeepSeek-R1-0528 including Qwen3 on your own local device!
Magistral: How to Run & Fine-tune: Meet Magistral - Mistral's new reasoning models.
Llama 4: How to Run & Fine-tune: How to run Llama 4 locally using our dynamic GGUFs which recovers accuracy compared to standard quantization.
Kimi K2: How to Run Locally: Guide on running Kimi K2 and Kimi-K2-Instruct-0905 on your own local device!
Grok 2: Run xAI's Grok 2 model locally!
Devstral: How to Run & Fine-tune: Run and fine-tune Mistral Devstral 1.1, including Small-2507 and 2505.
DeepSeek-V3-0324: How to Run Locally: How to run DeepSeek-V3-0324 locally using our dynamic quants which recovers accuracy
DeepSeek-R1: How to Run Locally: A guide on how you can run our 1.58-bit Dynamic Quants for DeepSeek-R1 using llama.cpp.
DeepSeek-R1 Dynamic 1.58-bit: See performance comparison tables for Unsloth's Dynamic GGUF Quants vs Standard IMatrix Quants.
QwQ-32B: How to Run effectively: How to run QwQ-32B effectively with our bug fixes and without endless generations + GGUFs.
Phi-4 Reasoning: How to Run & Fine-tune: Learn to run & fine-tune Phi-4 reasoning models locally with Unsloth + our Dynamic 2.0 quants
Running & Saving Models: Learn how to save your finetuned model so you can run it in your favorite inference engine.
Saving to GGUF: Saving models to 16bit for GGUF so you can use it for Ollama, Jan AI, Open WebUI and more!
Saving to Ollama
Saving to vLLM for deployment: Saving models to 16bit for vLLM deployment and serving
Saving to SGLang for deployment: Saving models to 16bit for SGLang for deployment and serving
Unsloth Inference: Learn how to run your finetuned model with Unsloth's faster inference.
Troubleshooting Inference: If you're experiencing issues when running or saving your model.
vLLM Engine Arguments
LoRA Hot Swapping Guide
Text-to-Speech (TTS) Fine-tuning: Learn how to fine-tune TTS & STT voice models with Unsloth.
Unsloth Dynamic 2.0 GGUFs: A big new upgrade to our Dynamic Quants!
Vision Fine-tuning: Learn how to fine-tune vision/multimodal LLMs with Unsloth
Fine-tuning LLMs with NVIDIA DGX Spark and Unsloth: Tutorial on how to fine-tune and do reinforcement learning (RL) with OpenAI gpt-oss on NVIDIA DGX Spark.
Fine-tuning LLMs with Blackwell, RTX 50 series & Unsloth: Learn how to fine-tune LLMs on NVIDIA's Blackwell RTX 50 series and B200 GPUs with our step-by-step guide.
Multi-GPU Training with Unsloth: Learn how to fine-tune LLMs on multiple GPUs and parallelism with Unsloth.
Finetuning from Last Checkpoint: Checkpointing allows you to save your finetuning progress so you can pause it and then continue.
Troubleshooting & FAQs: Tips to solve issues, and frequently asked questions.
Chat Templates: Learn the fundamentals and customization options of chat templates, including Conversational, ChatML, ShareGPT, Alpaca formats, and more!
Quantization-Aware Training (QAT): Quantize models to 4-bit with Unsloth and PyTorch to recover accuracy.
Unsloth Environment Flags: Advanced flags which might be useful if you see breaking finetunes, or you want to turn stuff off.
Continued Pretraining: AKA as Continued Finetuning. Unsloth allows you to continually pretrain so a model can learn a new language.
Unsloth Benchmarks: Unsloth recorded benchmarks on NVIDIA GPUs.

12 KiB Raw Blame History

Unsloth Documentation

Unsloth Documentation

12 KiB

Raw Blame History