Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arXiv:2509.18849

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 68
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 21
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3 • 23
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

MAPO: Mixed Advantage Policy Optimization

MAPO: Mixed Advantage Policy Optimization

Paper • 2509.18849 • Published Sep 23 • 26
WilliamHuang91/qwen2_5_vl_7b_geo_12_grpo

8B • Updated Sep 24 • 1
WilliamHuang91/qwen2_5_vl_7b_emoset_12_grpo

8B • Updated Sep 24 • 1
WilliamHuang91/qwen2_5_vl_7b_emoset_12_grpomixpro

8B • Updated Sep 24 • 2

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 307
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Paper • 2507.15758 • Published Jul 21 • 35
Hierarchical Budget Policy Optimization for Adaptive Reasoning

Paper • 2507.15844 • Published Jul 21 • 16
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

Paper • 2507.16814 • Published Jul 22 • 21

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6, 2024 • 28
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models

Paper • 2402.03749 • Published Feb 6, 2024 • 14
ScreenAI: A Vision-Language Model for UI and Infographics Understanding

Paper • 2402.04615 • Published Feb 7, 2024 • 44
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

Paper • 2402.05008 • Published Feb 7, 2024 • 23

MAPO: Mixed Advantage Policy Optimization

MAPO: Mixed Advantage Policy Optimization

Paper • 2509.18849 • Published Sep 23 • 26
WilliamHuang91/qwen2_5_vl_7b_geo_12_grpo

8B • Updated Sep 24 • 1
WilliamHuang91/qwen2_5_vl_7b_emoset_12_grpo

8B • Updated Sep 24 • 1
WilliamHuang91/qwen2_5_vl_7b_emoset_12_grpomixpro

8B • Updated Sep 24 • 2

Open Data Synthesis For Deep Research

Paper • 2509.00375 • Published Aug 30 • 68
Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 21
LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Paper • 2509.03405 • Published Sep 3 • 23
SATQuest: A Verifier for Logical Reasoning Evaluation and Reinforcement Fine-Tuning of LLMs

Paper • 2509.00930 • Published Aug 31 • 4

Group Sequence Policy Optimization

Paper • 2507.18071 • Published Jul 24 • 307
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Paper • 2507.15758 • Published Jul 21 • 35
Hierarchical Budget Policy Optimization for Adaptive Reasoning

Paper • 2507.15844 • Published Jul 21 • 16
Semi-off-Policy Reinforcement Learning for Vision-Language Slow-thinking Reasoning

Paper • 2507.16814 • Published Jul 22 • 21

RL+reason model

RL + Transformer = A General-Purpose Problem Solver

Paper • 2501.14176 • Published Jan 24 • 28
Towards General-Purpose Model-Free Reinforcement Learning

Paper • 2501.16142 • Published Jan 27 • 30
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published Jan 28 • 123
MaxInfoRL: Boosting exploration in reinforcement learning through information gain maximization

Paper • 2412.12098 • Published Dec 16, 2024 • 4

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs