THU-KEG
/

LLaDA-8B-BGPO-math

Reinforcement Learning

Model card Files Files and versions

LLaDA-8B-BGPO-math / README.md

linny2002's picture

Update README.md

15ce857 verified 22 days ago

|

history blame contribute delete

1.34 kB

	---
	license: apache-2.0
	language:
	- en
	tags:
	- reinforcement-learning
	- mathematics
	- dllm
	- bgpo
	- llada
	size_categories:
	- 8B
	---

	# LLaDA-8B-BGPO-math

	[![Paper](https://img.shields.io/badge/Paper-arXiv:2510.11683-red)](https://arxiv.org/abs/2510.11683)
	[![Code](https://img.shields.io/badge/Code-GitHub-blue)](https://github.com/THU-KEG/BGPO)

	## Model Description

	LLaDA-8B-BGPO-math is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced mathematical capabilities.

	## Model Details

	- Model Type: Diffusion Large Language Model (dLLM)
	- Parameters: 8 billion
	- Training Method: Boundary-Guided Policy Optimization (BGPO)
	- Base Model: LLaDA-8B-Instruct
	- Task: Mathematics
	- Language: English

	## Training Details

	- Training Steps: 700 steps
	- Response Length: 512 tokens
	- Train Diffusion Steps: 256
	- Eval Diffusion Steps: 512
	- Block Size: 32
	- Monte Carlo Sample Size ($n_t$): 16
	- Learning Rate: 5e-7
	- Batch Size: 16
	- Framework: Built on VeRL (Volcengine Reinforcement Learning)

	## Usage & Limitations

	- Primarily designed for mathematical tasks.
	- Performance may vary on other tasks.
	- Requires appropriate computational resources for inference.