Spaces:
Paused
Paused
# Command Line Interfaces (CLIs) | |
TRL provides a powerful command-line interface (CLI) to fine-tune large language models (LLMs) using methods like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and more. The CLI abstracts away much of the boilerplate, letting you launch training jobs quickly and reproducibly. | |
Currently supported commands are: | |
#### Training Commands | |
- `trl dpo`: fine-tune a LLM with DPO | |
- `trl grpo`: fine-tune a LLM with GRPO | |
- `trl kto`: fine-tune a LLM with KTO | |
- `trl sft`: fine-tune a LLM with SFT | |
#### Other Commands | |
- `trl env`: get the system information | |
- `trl vllm-serve`: serve a model with vLLM | |
## Fine-Tuning with the TRL CLI | |
### Basic Usage | |
You can launch training directly from the CLI by specifying required arguments like the model and dataset: | |
<hfoptions id="command_line"> | |
<hfoption id="SFT"> | |
```bash | |
trl sft \ | |
--model_name_or_path Qwen/Qwen2.5-0.5B \ | |
--dataset_name stanfordnlp/imdb | |
``` | |
</hfoption> | |
<hfoption id="DPO"> | |
```bash | |
trl dpo \ | |
--model_name_or_path Qwen/Qwen2.5-0.5B \ | |
--dataset_name anthropic/hh-rlhf | |
``` | |
</hfoption> | |
</hfoptions> | |
### Using Configuration Files | |
To keep your CLI commands clean and reproducible, you can define all training arguments in a YAML configuration file: | |
<hfoptions id="config_file"> | |
<hfoption id="SFT"> | |
```yaml | |
# sft_config.yaml | |
model_name_or_path: Qwen/Qwen2.5-0.5B | |
dataset_name: stanfordnlp/imdb | |
``` | |
Launch with: | |
```bash | |
trl sft --config sft_config.yaml | |
``` | |
</hfoption> | |
<hfoption id="DPO"> | |
```yaml | |
# dpo_config.yaml | |
model_name_or_path: Qwen/Qwen2.5-0.5B | |
dataset_name: anthropic/hh-rlhf | |
``` | |
Launch with: | |
```bash | |
trl dpo --config dpo_config.yaml | |
``` | |
</hfoption> | |
</hfoptions> | |
### Scaling Up with Accelerate | |
TRL CLI natively supports [🤗 Accelerate](https://huggingface.co/docs/accelerate), making it easy to scale training across multiple GPUs, machines, or use advanced setups like DeepSpeed — all from the same CLI. | |
You can pass any `accelerate launch` arguments directly to `trl`, such as `--num_processes`. For more information see [Using accelerate launch](https://huggingface.co/docs/accelerate/en/basic_tutorials/launch#using-accelerate-launch). | |
<hfoptions id="launch_args"> | |
<hfoption id="SFT inline"> | |
```bash | |
trl sft \ | |
--model_name_or_path Qwen/Qwen2.5-0.5B \ | |
--dataset_name stanfordnlp/imdb \ | |
--num_processes 4 | |
``` | |
</hfoption> | |
<hfoption id="SFT w/ config file"> | |
```yaml | |
# sft_config.yaml | |
model_name_or_path: Qwen/Qwen2.5-0.5B | |
dataset_name: stanfordnlp/imdb | |
num_processes: 4 | |
``` | |
Launch with: | |
```bash | |
trl sft --config sft_config.yaml | |
``` | |
</hfoption> | |
<hfoption id="DPO inline"> | |
```bash | |
trl dpo \ | |
--model_name_or_path Qwen/Qwen2.5-0.5B \ | |
--dataset_name anthropic/hh-rlhf \ | |
--num_processes 4 | |
``` | |
</hfoption> | |
<hfoption id="DPO w/ config file"> | |
```yaml | |
# dpo_config.yaml | |
model_name_or_path: Qwen/Qwen2.5-0.5B | |
dataset_name: anthropic/hh-rlhf | |
num_processes: 4 | |
``` | |
Launch with: | |
```bash | |
trl dpo --config dpo_config.yaml | |
``` | |
</hfoption> | |
</hfoptions> | |
### Using `--accelerate_config` for Accelerate Configuration | |
The `--accelerate_config` flag lets you easily configure distributed training with [🤗 Accelerate](https://github.com/huggingface/accelerate). This flag accepts either: | |
* the name of a predefined config profile (built into TRL), or | |
* a path to a custom Accelerate YAML config file. | |
#### Predefined Config Profiles | |
TRL provides several ready-to-use Accelerate configs to simplify common training setups: | |
| Name | Description | | |
| ------------ | ----------------------------------- | | |
| `fsdp1` | Fully Sharded Data Parallel Stage 1 | | |
| `fsdp2` | Fully Sharded Data Parallel Stage 2 | | |
| `zero1` | DeepSpeed ZeRO Stage 1 | | |
| `zero2` | DeepSpeed ZeRO Stage 2 | | |
| `zero3` | DeepSpeed ZeRO Stage 3 | | |
| `multi_gpu` | Multi-GPU training | | |
| `single_gpu` | Single-GPU training | | |
To use one of these, just pass the name to `--accelerate_config`. TRL will automatically load the corresponding config file from `trl/accelerate_config/`. | |
#### Example Usage | |
<hfoptions id="accelerate_config"> | |
<hfoption id="SFT inline"> | |
```bash | |
trl sft \ | |
--model_name_or_path Qwen/Qwen2.5-0.5B \ | |
--dataset_name stanfordnlp/imdb \ | |
--accelerate_config zero2 # or path/to/my/accelerate/config.yaml | |
``` | |
</hfoption> | |
<hfoption id="SFT w/ config file"> | |
```yaml | |
# sft_config.yaml | |
model_name_or_path: Qwen/Qwen2.5-0.5B | |
dataset_name: stanfordnlp/imdb | |
accelerate_config: zero2 # or path/to/my/accelerate/config.yaml | |
``` | |
Launch with: | |
```bash | |
trl sft --config sft_config.yaml | |
``` | |
</hfoption> | |
<hfoption id="DPO inline"> | |
```bash | |
trl dpo \ | |
--model_name_or_path Qwen/Qwen2.5-0.5B \ | |
--dataset_name anthropic/hh-rlhf \ | |
--accelerate_config zero2 # or path/to/my/accelerate/config.yaml | |
``` | |
</hfoption> | |
<hfoption id="DPO w/ config file"> | |
```yaml | |
# dpo_config.yaml | |
model_name_or_path: Qwen/Qwen2.5-0.5B | |
dataset_name: anthropic/hh-rlhf | |
accelerate_config: zero2 # or path/to/my/accelerate/config.yaml | |
``` | |
Launch with: | |
```bash | |
trl dpo --config dpo_config.yaml | |
``` | |
</hfoption> | |
</hfoptions> | |
## Getting the System Information | |
You can get the system information by running the following command: | |
```bash | |
trl env | |
``` | |
This will print out the system information, including the GPU information, the CUDA version, the PyTorch version, the transformers version, the TRL version, and any optional dependencies that are installed. | |
```txt | |
Copy-paste the following information when reporting an issue: | |
- Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31 | |
- Python version: 3.11.9 | |
- PyTorch version: 2.4.1 | |
- accelerator(s): NVIDIA H100 80GB HBM3 | |
- Transformers version: 4.45.0.dev0 | |
- Accelerate version: 0.34.2 | |
- Accelerate config: | |
- compute_environment: LOCAL_MACHINE | |
- distributed_type: DEEPSPEED | |
- mixed_precision: no | |
- use_cpu: False | |
- debug: False | |
- num_processes: 4 | |
- machine_rank: 0 | |
- num_machines: 1 | |
- rdzv_backend: static | |
- same_network: True | |
- main_training_function: main | |
- enable_cpu_affinity: False | |
- deepspeed_config: {'gradient_accumulation_steps': 4, 'offload_optimizer_device': 'none', 'offload_param_device': 'none', 'zero3_init_flag': False, 'zero_stage': 2} | |
- downcast_bf16: no | |
- tpu_use_cluster: False | |
- tpu_use_sudo: False | |
- tpu_env: [] | |
- Datasets version: 3.0.0 | |
- HF Hub version: 0.24.7 | |
- TRL version: 0.12.0.dev0+acb4d70 | |
- bitsandbytes version: 0.41.1 | |
- DeepSpeed version: 0.15.1 | |
- Diffusers version: 0.30.3 | |
- Liger-Kernel version: 0.3.0 | |
- LLM-Blender version: 0.0.2 | |
- OpenAI version: 1.46.0 | |
- PEFT version: 0.12.0 | |
- vLLM version: not installed | |
``` | |
This information is required when reporting an issue. | |