LLaDA-8B-BGPO-math / README.md
linny2002's picture
Update README.md
15ce857 verified
metadata
license: apache-2.0
language:
  - en
tags:
  - reinforcement-learning
  - mathematics
  - dllm
  - bgpo
  - llada
size_categories:
  - 8B

LLaDA-8B-BGPO-math

Paper Code

Model Description

LLaDA-8B-BGPO-math is an 8-billion parameter diffusion large language model (dLLM) that was trained on LLaDA-8B-Instruct using Boundary-Guided Policy Optimization (BGPO) for enhanced mathematical capabilities.

Model Details

  • Model Type: Diffusion Large Language Model (dLLM)
  • Parameters: 8 billion
  • Training Method: Boundary-Guided Policy Optimization (BGPO)
  • Base Model: LLaDA-8B-Instruct
  • Task: Mathematics
  • Language: English

Training Details

  • Training Steps: 700 steps
  • Response Length: 512 tokens
  • Train Diffusion Steps: 256
  • Eval Diffusion Steps: 512
  • Block Size: 32
  • Monte Carlo Sample Size ($n_t$): 16
  • Learning Rate: 5e-7
  • Batch Size: 16
  • Framework: Built on VeRL (Volcengine Reinforcement Learning)

Usage & Limitations

  • Primarily designed for mathematical tasks.
  • Performance may vary on other tasks.
  • Requires appropriate computational resources for inference.