Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
THU-KEG
's Collections
LLaDA-8B-BGPO
DeepPrune
SIRI
VerIF
AdaptThink
LongWriter-V
OpenSAE-LLaMA-3.1-8B
Crab
ADELIE
LLaDA-8B-BGPO
updated
22 days ago
Boundary-Guided Policy Optimization for Memory-Efficient RL of Diffusion Large Language Models
Upvote
4
THU-KEG/LLaDA-8B-BGPO-math
Reinforcement Learning
•
8B
•
Updated
19 days ago
•
34
•
1
THU-KEG/LLaDA-8B-BGPO-code
Reinforcement Learning
•
8B
•
Updated
19 days ago
•
28
•
1
THU-KEG/LLaDA-8B-BGPO-countdown
Reinforcement Learning
•
8B
•
Updated
19 days ago
•
30
•
1
THU-KEG/LLaDA-8B-BGPO-sudoku
Reinforcement Learning
•
8B
•
Updated
19 days ago
•
30
•
1
Upvote
4
Share collection
View history
Collection guide
Browse collections