LLaDA-8B-BGPO-math / README.md

linny2002

Update README.md

15ce857 verified 20 days ago

preview code

raw

history blame contribute delete

1.34 kB

metadata

license: apache-2.0
language:
  - en
tags:
  - reinforcement-learning
  - mathematics
  - dllm
  - bgpo
  - llada
size_categories:
  - 8B

LLaDA-8B-BGPO-math

Model Description

LLaDA-8B-BGPO-math is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced mathematical capabilities.

Model Details

Model Type: Diffusion Large Language Model (dLLM)
Parameters: 8 billion
Training Method: Boundary-Guided Policy Optimization (BGPO)
Base Model: LLaDA-8B-Instruct
Task: Mathematics
Language: English

Training Details

Training Steps: 700 steps
Response Length: 512 tokens
Train Diffusion Steps: 256
Eval Diffusion Steps: 512
Block Size: 32
Monte Carlo Sample Size ($n_t$): 16
Learning Rate: 5e-7
Batch Size: 16
Framework: Built on VeRL (Volcengine Reinforcement Learning)

Usage & Limitations

Primarily designed for mathematical tasks.
Performance may vary on other tasks.
Requires appropriate computational resources for inference.