hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_2048_to_16384_nokl Text Generation • 8B • Updated Oct 12 • 16
hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_4096_to_16384_nokl Text Generation • 8B • Updated Oct 14 • 5
hdong0/Qwen3-8B-base-Open-R1-GRPO_dapo_acc_8192_to_16384_nokl Text Generation • 8B • Updated Oct 15 • 10
DCAgent/staqc-sandboxes-traces-terminus-2_Qwen3-8B-Base Text Generation • 308k • Updated 22 days ago • 96
ChenWu98/zebra_logic_train_for_sft_correct_responses_gpt-oss-20b_shortest_from_qwen3-8b-base Updated 13 days ago
ChenWu98/zebra_logic_train_for_sft_correct_responses_gpt-oss-20b_shortest_from_qwen3-8b-base-1e-5 Updated 12 days ago
ChenWu98/zebra_logic_train_for_sft_correct_responses_gpt-oss-20b_shortest_n2000_from_qwen3-8b-base-1e-5 Updated 7 days ago
ChenWu98/zebra_logic_and_e3_correct_responses_gpt-oss-20b_shortest_zebra_n2000_from_qwen3-8b-base-1e-5 Updated 6 days ago
ChenWu98/zebra_logic_train_for_sft_correct_responses_Qwen3-4B_longest_from_qwen3-8b-base-1e-5 Updated 5 days ago
ChenWu98/qwen3-8b-base_e3_correct_responses_gpt-oss-20b_reasoning_effort_low_shortest-1e-5 Updated 3 days ago