Efficient OpAmp Adaptation for Zoom Attention to Golden Contexts Paper • 2502.12502 • Published Feb 18
On-Policy Optimization with Group Equivalent Preference for Multi-Programming Language Understanding Paper • 2505.12723 • Published May 19
ToTRL: Unlock LLM Tree-of-Thoughts Reasoning Potential through Puzzles Solving Paper • 2505.12717 • Published May 19
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient Paper • 2509.26313 • Published Sep 30 • 4