Stella Li PRO
stellalisy
AI & ML interests
None yet
Recent Activity
published
a dataset
15 days ago
stellalisy/PrefPalette
updated
a dataset
15 days ago
stellalisy/PrefPalette
updated
a dataset
29 days ago
stellalisy/HorizonPref_natural_0827
Organizations
Spurious Rewards
Spurious Rewards: Rethinking Training Signals in RLVR
-
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step50
Text Generation • 8B • Updated • 3 -
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step100
Text Generation • 8B • Updated • 4 -
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step150
Text Generation • 8B • Updated • 16 -
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step50
Text Generation • 8B • Updated • 9
Personalized Reasoning
Spurious Rewards
Spurious Rewards: Rethinking Training Signals in RLVR
-
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step50
Text Generation • 8B • Updated • 3 -
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step100
Text Generation • 8B • Updated • 4 -
stellalisy/rethink_rlvr_reproduce-ground_truth-qwen2.5_math_7b-lr5e-7-kl0.00-step150
Text Generation • 8B • Updated • 16 -
stellalisy/rethink_rlvr_reproduce-majority_vote-qwen2.5_math_7b-lr5e-7-kl0.00-step50
Text Generation • 8B • Updated • 9
models
30
stellalisy/system_select_dpo-3b-lr1e-5-b0.1
Text Generation
•
3B
•
Updated
•
3
stellalisy/system_select_dpo-3b-lr1e-6-b0.1
Text Generation
•
3B
•
Updated
•
5
stellalisy/system_select_dpo-3b-lr1e-5-b0.0
Text Generation
•
3B
•
Updated
•
3
stellalisy/system_select_dpo-1b-lr1e-6-b0.1
Text Generation
•
1B
•
Updated
•
1
stellalisy/system_select_dpo-1b-lr1e-5-b0.1
Text Generation
•
1B
•
Updated
•
1
stellalisy/system_select_dpo-1b-lr1e-6-b0.0
Text Generation
•
1B
•
Updated
•
1
stellalisy/system_select_dpo-1b-lr1e-5-b0.0
Text Generation
•
1B
•
Updated
•
2
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step150
Text Generation
•
8B
•
Updated
•
8
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step100
Text Generation
•
8B
•
Updated
•
8
stellalisy/rethink_rlvr_reproduce-incorrect-qwen2.5_math_7b-lr5e-7-kl0.00-step50
Text Generation
•
8B
•
Updated
•
6
datasets
21
stellalisy/PrefPalette
Viewer
•
Updated
•
2.01M
•
5
stellalisy/HorizonPref_natural_0827
Viewer
•
Updated
•
1.75k
•
66
stellalisy/DAPO-Math-14k-Processed-RLVR_random
Viewer
•
Updated
•
14.1k
•
249
stellalisy/rlvr_orz_math_57k_collected_random
Viewer
•
Updated
•
56.9k
•
88
stellalisy/personalized_simpleqa
Preview
•
Updated
•
14
stellalisy/personalized_socialiqa
Preview
•
Updated
•
10
stellalisy/personalized_scienceqa
Preview
•
Updated
•
11
stellalisy/personalized_mmlu
Preview
•
Updated
•
10
stellalisy/personalized_medqa
Preview
•
Updated
•
14
stellalisy/personalized_commonsenseqa
Preview
•
Updated
•
10