trl-sandbox / docs /source /community_tutorials.md
ivangabriele's picture
feat: initialize project
2f5127c verified

Community Tutorials

Community tutorials are made by active members of the Hugging Face community who want to share their knowledge and expertise with others. They are a great way to learn about the library and its features, and to get started with core classes and modalities.

Language Models

Task Class Description Author Tutorial Colab
Reinforcement Learning [GRPOTrainer] Post training an LLM for reasoning with GRPO in TRL Sergio Paniego Link Open In Colab
Reinforcement Learning [GRPOTrainer] Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial Philipp Schmid Link Open In Colab
Reinforcement Learning [GRPOTrainer] RL on LLaMA 3.1-8B with GRPO and Unsloth optimizations Andrea Manzoni Link Open In Colab
Instruction tuning [SFTTrainer] Fine-tuning Google Gemma LLMs using ChatML format with QLoRA Philipp Schmid Link Open In Colab
Structured Generation [SFTTrainer] Fine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFT Mohammadreza Esmaeilian Link Open In Colab
Preference Optimization [DPOTrainer] Align Mistral-7b using Direct Preference Optimization for human preference alignment Maxime Labonne Link Open In Colab
Preference Optimization [ORPOTrainer] Fine-tuning Llama 3 with ORPO combining instruction tuning and preference alignment Maxime Labonne Link Open In Colab
Instruction tuning [SFTTrainer] How to fine-tune open LLMs in 2025 with Hugging Face Philipp Schmid Link Open In Colab

Vision Language Models

Task Class Description Author Tutorial Colab
Visual QA [SFTTrainer] Fine-tuning Qwen2-VL-7B for visual question answering on ChartQA dataset Sergio Paniego Link Open In Colab
Visual QA [SFTTrainer] Fine-tuning SmolVLM with TRL on a consumer GPU Sergio Paniego Link Open In Colab
SEO Description [SFTTrainer] Fine-tuning Qwen2-VL-7B for generating SEO-friendly descriptions from images Philipp Schmid Link Open In Colab
Visual QA [DPOTrainer] PaliGemma 🤝 Direct Preference Optimization Merve Noyan Link Open In Colab
Visual QA [DPOTrainer] Fine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPU Sergio Paniego Link Open In Colab

Contributing

If you have a tutorial that you would like to add to this list, please open a PR to add it. We will review it and merge it if it is relevant to the community.