KevinG/Meta-Llama-3-8B-Instruct-GRPO-alpaca_naive_100_no_KL Text Generation • 8B • Updated Jun 16 • 4
KevinG/Meta-Llama-3-8B-Instruct-GRPO-alpaca_naive_500_no_KL Text Generation • 8B • Updated Jun 16 • 4
KevinG/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-10-mix-100-10-more-rounds-AT-1 Text Generation • 8B • Updated Jun 26 • 4
KevinG/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-100-reproduce-AT-1 Text Generation • 8B • Updated Jun 26 • 4
KevinG/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-100-reproduce-no-seed-AT-1 Text Generation • 8B • Updated Jun 26 • 4
KevinG/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-100-reproduce-version-AT-1 Text Generation • 8B • Updated Jun 26 • 4
KevinG/Meta-Llama-3-8B-Instruct-GRPO-AT-combine-100-reproduce-version-3-AT-1 Text Generation • 8B • Updated Jun 26 • 7
KevinG/Meta-Llama-3-8B-Instruct-GRPO-alpaca_combine_100_no_KL_42_reproduce Text Generation • 8B • Updated Jun 27 • 8
KevinG/Meta-Llama-3-8B-Instruct-GRPO-mixed-cosine-42 Text Generation • 8B • Updated 28 days ago • 143
KevinG/Meta-Llama-3-8B-Instruct-GRPO-mixed-llm-judge-42 Text Generation • 8B • Updated 27 days ago • 77