zpysky1125's picture

2 13 6

zpysky1125

pyzhao

·

AI & ML interests

None yet

Recent Activity

upvoted an article 15 days ago

What makes good reasoning data

upvoted an article 15 days ago

Aligning to What? Rethinking Agent Generalization in MiniMax M2

upvoted an article 15 days ago

Why Did MiniMax M2 End Up as a Full Attention Model?

View all activity

Organizations

upvoted 3 articles 15 days ago

Article

What makes good reasoning data

15 days ago

•

31

Article

Aligning to What? Rethinking Agent Generalization in MiniMax M2

15 days ago

•

22

Article

Why Did MiniMax M2 End Up as a Full Attention Model?

15 days ago

•

59

upvoted a paper 2 months ago

WebExplorer: Explore and Evolve for Training Long-Horizon Web Agents

Paper • 2509.06501 • Published Sep 8 • 78

upvoted 2 papers 5 months ago

MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention

Paper • 2506.13585 • Published Jun 16 • 270

ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models

Paper • 2505.24864 • Published May 30 • 141

upvoted 4 papers 6 months ago

SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond

Paper • 2505.19641 • Published May 26 • 67

One RL to See Them All: Visual Triple Unified Reinforcement Learning

Paper • 2505.18129 • Published May 23 • 59

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

Paper • 2505.07608 • Published May 12 • 82

MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder

Paper • 2505.07916 • Published May 12 • 132

upvoted a paper 8 months ago

Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback

Paper • 2503.22230 • Published Mar 28 • 45

upvoted a paper 10 months ago

REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models

Paper • 2501.03262 • Published Jan 4 • 103

upvoted an article over 1 year ago

Article

Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models

Mar 20, 2024

•

104