--- license: apache-2.0 language: - en tags: - reinforcement-learning - mathematics - dllm - bgpo - llada size_categories: - 8B --- # LLaDA-8B-BGPO-math [![Paper](https://img.shields.io/badge/Paper-arXiv:2510.11683-red)](https://arxiv.org/abs/2510.11683) [![Code](https://img.shields.io/badge/Code-GitHub-blue)](https://github.com/THU-KEG/BGPO) ## Model Description **LLaDA-8B-BGPO-math** is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced mathematical capabilities. ## Model Details - **Model Type**: Diffusion Large Language Model (dLLM) - **Parameters**: 8 billion - **Training Method**: Boundary-Guided Policy Optimization (BGPO) - **Base Model**: LLaDA-8B-Instruct - **Task**: Mathematics - **Language**: English ## Training Details - **Training Steps**: 700 steps - **Response Length**: 512 tokens - **Train Diffusion Steps**: 256 - **Eval Diffusion Steps**: 512 - **Block Size**: 32 - **Monte Carlo Sample Size ($n_t$)**: 16 - **Learning Rate**: 5e-7 - **Batch Size**: 16 - **Framework**: Built on VeRL (Volcengine Reinforcement Learning) ## Usage & Limitations - Primarily designed for mathematical tasks. - Performance may vary on other tasks. - Requires appropriate computational resources for inference.