One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient Paper • 2509.26313 • Published Sep 30 • 4 • 4
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient Paper • 2509.26313 • Published Sep 30 • 4 • 4