Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
BounharAbdelaziz
's Collections
RLHF/RLVR
Moroccan Darija LLMs
Moroccan Darija Embeddings Models & Datasets
Moroccan Speech Models & Datasets
Moroccan Darija Datasets
Translation Models & Datasets
Arabic (MSA) Language Models & Datasets
Arabic (MSA) Summarization Models & Datasets
RLHF/RLVR
updated
13 days ago
Some RLHF/RLVR experiments using GRPO and DPO.
Upvote
-
BounharAbdelaziz/Qwen2.5-3B-GRPO-Math-GSM8K
Text Generation
•
3B
•
Updated
Jun 25
•
14
BounharAbdelaziz/Qwen2.5-0.5B-DPO-English-Orca
Text Generation
•
0.5B
•
Updated
Jun 25
•
5
BounharAbdelaziz/Qwen2.5-0.5B-DPO-French-Orca
Text Generation
•
0.5B
•
Updated
Jun 25
•
6
Upvote
-
Share collection
View history
Collection guide
Browse collections