Reinforcement Learning GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 232
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 232
Villa-46 Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward Paper • 2508.12800 • Published Aug 18, 2025 • 6
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward Paper • 2508.12800 • Published Aug 18, 2025 • 6
Reinforcement Learning GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 232
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 232
Villa-46 Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward Paper • 2508.12800 • Published Aug 18, 2025 • 6
Atom-Searcher: Enhancing Agentic Deep Research via Fine-Grained Atomic Thought Reward Paper • 2508.12800 • Published Aug 18, 2025 • 6