Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Alessamo 's Collections
time series LLM
entropy
data
RL
DPO

RL

updated Jul 2
Upvote
-

  • Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

    Paper • 2504.13837 • Published Apr 18 • 134

  • TTRL: Test-Time Reinforcement Learning

    Paper • 2504.16084 • Published Apr 22 • 120

  • What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models

    Paper • 2503.24235 • Published Mar 31 • 55

  • Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning

    Paper • 2506.01939 • Published Jun 2 • 176

  • Reinforcement Pre-Training

    Paper • 2506.08007 • Published Jun 9 • 253

  • Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

    Paper • 2506.14965 • Published Jun 17 • 49

  • SRFT: A Single-Stage Method with Supervised and Reinforcement Fine-Tuning for Reasoning

    Paper • 2506.19767 • Published Jun 24 • 13

  • Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning

    Paper • 2507.00432 • Published Jul 1 • 72
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs