🐙 OctoThinker - a koalazf99 Collection

koalazf99 's Collections

🐙 OctoThinker

🫐 ProX Projects

🐙 OctoThinker

updated Jun 26

Mid-training Incentivizes Reinforcement Learning Scaling

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

Paper • 2506.20512 • Published Jun 25 • 46
OctoThinker/MegaMath-Web-Pro-Max

Viewer • Updated about 1 month ago • 69.2M • 4.83k • 34
OctoThinker/OctoThinker-8B-Long-Base

Text Generation • 8B • Updated about 1 month ago • 45
OctoThinker/OctoThinker-8B-Hybrid-Base

Text Generation • 8B • Updated about 1 month ago • 96 • 2
OctoThinker/OctoThinker-8B-Short-Base

Text Generation • 8B • Updated about 1 month ago • 1k
OctoThinker/OctoThinker-3B-Short-Zero

Text Generation • 4B • Updated 25 days ago • 11
OctoThinker/OctoThinker-3B-Hybrid-Zero

Text Generation • 4B • Updated 25 days ago • 338 • 1
OctoThinker/OctoThinker-1B-Long-Zero

Text Generation • 1B • Updated about 1 month ago • 9
OctoThinker/OctoThinker-1B-Hybrid-Zero

Text Generation • 1B • Updated about 1 month ago • 9
OctoThinker/OctoThinker-1B-Short-Zero

Text Generation • 1B • Updated about 1 month ago • 10
OctoThinker/Llama3.2-3B-Zero

4B • Updated Apr 22 • 3
OctoThinker/OctoThinker-3B-Long-Zero

Text Generation • 4B • Updated about 1 month ago • 75
OctoThinker/OctoThinker-1B-Long-Base

Text Generation • 1B • Updated about 1 month ago • 12
OctoThinker/OctoThinker-1B-Short-Base

Text Generation • 1B • Updated about 1 month ago • 219
OctoThinker/OctoThinker-1B-Hybrid-Base

Text Generation • 1B • Updated about 1 month ago • 11
OctoThinker/OctoThinker-3B-Long-Base

Text Generation • 3B • Updated about 1 month ago • 6.27k
OctoThinker/OctoThinker-3B-Hybrid-Base

Text Generation • 3B • Updated 25 days ago • 411
OctoThinker/OctoThinker-3B-Short-Base

Text Generation • 3B • Updated 25 days ago • 1.6k