|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
tags: |
|
|
- reinforcement-learning |
|
|
- mathematics |
|
|
- dllm |
|
|
- bgpo |
|
|
- llada |
|
|
size_categories: |
|
|
- 8B |
|
|
--- |
|
|
|
|
|
# LLaDA-8B-BGPO-math |
|
|
|
|
|
[](https://arxiv.org/abs/2510.11683) |
|
|
[](https://github.com/THU-KEG/BGPO) |
|
|
|
|
|
## Model Description |
|
|
|
|
|
**LLaDA-8B-BGPO-math** is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced mathematical capabilities. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Model Type**: Diffusion Large Language Model (dLLM) |
|
|
- **Parameters**: 8 billion |
|
|
- **Training Method**: Boundary-Guided Policy Optimization (BGPO) |
|
|
- **Base Model**: LLaDA-8B-Instruct |
|
|
- **Task**: Mathematics |
|
|
- **Language**: English |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Training Steps**: 700 steps |
|
|
- **Response Length**: 512 tokens |
|
|
- **Train Diffusion Steps**: 256 |
|
|
- **Eval Diffusion Steps**: 512 |
|
|
- **Block Size**: 32 |
|
|
- **Monte Carlo Sample Size ($n_t$)**: 16 |
|
|
- **Learning Rate**: 5e-7 |
|
|
- **Batch Size**: 16 |
|
|
- **Framework**: Built on VeRL (Volcengine Reinforcement Learning) |
|
|
|
|
|
## Usage & Limitations |
|
|
|
|
|
- Primarily designed for mathematical tasks. |
|
|
- Performance may vary on other tasks. |
|
|
- Requires appropriate computational resources for inference. |
|
|
|