--- license: apache-2.0 --- # InfiR2-1.5B-Instruct-FP8

  📄 Paper   |   🐙 Github   |   🌐 Project Website  

We performed supervised fine-tuning on the **InfiR2-1.5B-base-FP8** with FP8 format in two stages using the InfiAlign-SFT-72k and InfiAlign-SFT-165k datasets. **Training Recipe**:

- Stable and Reproducible Performance - Efficient and Low memory Training **Hyperparameters**:

| Parameter | Value | | :---: | :---: | | **Batch Size** | 64 | | **Learning Rate** | 5e-5 | | **Minimum Learning Rate** | 5e-6 | | **Weight Decay** | 0.05 | | **Context Length** | 32k |
The resulting model is the **InfiR2-1.5B-Instruct-FP8**. ## 🚀 InfiR2 Model Series The InfiR2 framework offers multiple variants model with different size and training strategy: - **1.5B** - [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base* - [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)* - **7B** - [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base* - [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)* - [InfiR2-R1-7B-FP8-Preview](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8-Preview): *Multi-stage FP8 Reinforcement Learning* ## 📊 Model Performance Below is the performance comparison of InfiR2-1.5B-Instruct-FP8 on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
Model AIME 25 AIME 24 GPQA LiveCodeBench v5
Deepseek-Distill-Qwen-1.5B 21.35 26.87 32.26 18.50
Qwen2.5-1.5B-base (w. InfiAlign) 14.58 10.52 28.98 12.99
InfiR2-1.5B-Instruct-FP8 18.45 17.39 29.48 17.10
## 🎭 Quick Start ```python from vllm import LLM, SamplingParams import torch import os MODEL_NAME = "InfiX-ai/InfiR2-1.5B-Instruct-FP8" prompt_text = "Briefly explain what a black hole is, and provide two interesting facts." MAX_NEW_TOKENS = 256 TEMPERATURE = 0.8 DO_SAMPLE = True llm = LLM( model=MODEL_NAME, dtype="auto", ) sampling_params = SamplingParams( n=1, temperature=TEMPERATURE, max_tokens=MAX_NEW_TOKENS, ) tokenizer = llm.get_tokenizer() messages = [ {"role": "user", "content": prompt_text} ] prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) outputs = llm.generate( prompt_formatted, sampling_params ) generated_text = outputs[0].outputs[0].text llm_response = generated_text.strip() print("\n" + "="*70) print(f"Prompt: \n{prompt_text}") print("-" * 70) print(f"(LLM Response): \n{llm_response}") print("="*70) ``` ## 📚 Model Download ```bash # Create a directory for models mkdir -p ./models # Download InfiR2-1.5B-Instruct-FP8 model huggingface-cli download --resume-download InfiX-ai/InfiR2-1.5B-Instruct-FP8 --local-dir ./models/InfiR2-1.5B-Instruct-FP8 ``` ## 🎯 Intended Uses ### ✅ Direct Use This model is intended for research and commercial use. Example use cases include: - Instruction following - Mathematical reasoning - Code generation - General reasoning ### ❌ Out-of-Scope Use The model should **not** be used for: - Generating harmful, offensive, or inappropriate content - Creating misleading information ## 🙏 Acknowledgements * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math). ## 📌 Citation If you find our work useful, please cite: ```bibtex @misc{wang2025infir2comprehensivefp8training, title={InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models}, author={Wenjun Wang and Shuo Cai and Congkai Xie and Mingfa Feng and Yiming Zhang and Zhen Li and Kejing Yang and Ming Li and Jiannong Cao and Hongxia Yang}, year={2025}, eprint={2509.22536}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={[https://arxiv.org/abs/2509.22536](https://arxiv.org/abs/2509.22536)}, } ```