H's picture

1 7

H

SunSwallow

AI & ML interests

None yet

Recent Activity

upvoted a paper 23 days ago

Agent Learning via Early Experience

upvoted a paper 26 days ago

Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

upvoted a paper 26 days ago

Training-Free Group Relative Policy Optimization

View all activity

Organizations

None yet

upvoted a paper 23 days ago

Agent Learning via Early Experience

Paper • 2510.08558 • Published 26 days ago • 258

upvoted 2 papers 26 days ago

Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning

Paper • 2509.22601 • Published Sep 26 • 29

Training-Free Group Relative Policy Optimization

Paper • 2510.08191 • Published 27 days ago • 44

upvoted a paper about 1 month ago

From Uniform to Heterogeneous: Tailoring Policy Optimization to Every Token's Nature

Paper • 2509.16591 • Published Sep 20 • 2

upvoted a paper 2 months ago

The Landscape of Agentic Reinforcement Learning for LLMs: A Survey

Paper • 2509.02547 • Published Sep 2 • 220

upvoted a paper 3 months ago

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 137

upvoted a collection 3 months ago

OpenMathReasoning

Models and datasets from "AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset" • 7 items • Updated about 19 hours ago • 44

commented a paper 3 months ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published Jul 26 • 156 •