- 
	
	
	
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 - 
	
	
	
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 - 
	
	
	
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 - 
	
	
	
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23 
Collections
Discover the best community collections!
Collections including paper arxiv:2508.05748 
						
					
				- 
	
	
	
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 672 - 
	
	
	
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 342 - 
	
	
	
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 235 - 
	
	
	
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 220 
- 
	
	
	
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning
Paper • 2508.18966 • Published • 56 - 
	
	
	
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 672 - 
	
	
	
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 70 - 
	
	
	
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Paper • 2411.15466 • Published • 39 
- 
	
	
	
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 137 - 
	
	
	
TTT3R: 3D Reconstruction as Test-Time Training
Paper • 2509.26645 • Published • 14 - 
	
	
	
Human3R: Everyone Everywhere All at Once
Paper • 2510.06219 • Published • 9 - 
	
	
	
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 463 
- 
	
	
	
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 137 - 
	
	
	
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 96 - 
	
	
	
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 50 - 
	
	
	
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
Paper • 2508.14041 • Published • 59 
- 
	
	
	
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 137 - 
	
	
	
WebDancer: Towards Autonomous Information Seeking Agency
Paper • 2505.22648 • Published • 33 - 
	
	
	
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 52 
- 
	
	
	
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 28 - 
	
	
	
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 - 
	
	
	
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 - 
	
	
	
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23 
- 
	
	
	
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 137 - 
	
	
	
TTT3R: 3D Reconstruction as Test-Time Training
Paper • 2509.26645 • Published • 14 - 
	
	
	
Human3R: Everyone Everywhere All at Once
Paper • 2510.06219 • Published • 9 - 
	
	
	
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 463 
- 
	
	
	
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 672 - 
	
	
	
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Paper • 2508.18106 • Published • 342 - 
	
	
	
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model
Paper • 2509.09372 • Published • 235 - 
	
	
	
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper • 2509.02547 • Published • 220 
- 
	
	
	
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 137 - 
	
	
	
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper • 2508.03680 • Published • 96 - 
	
	
	
SpatialLM: Training Large Language Models for Structured Indoor Modeling
Paper • 2506.07491 • Published • 50 - 
	
	
	
LongSplat: Robust Unposed 3D Gaussian Splatting for Casual Long Videos
Paper • 2508.14041 • Published • 59 
- 
	
	
	
USO: Unified Style and Subject-Driven Generation via Disentangled and Reward Learning
Paper • 2508.18966 • Published • 56 - 
	
	
	
Sharing is Caring: Efficient LM Post-Training with Collective RL Experience Sharing
Paper • 2509.08721 • Published • 672 - 
	
	
	
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 70 - 
	
	
	
Large-Scale Text-to-Image Model with Inpainting is a Zero-Shot Subject-Driven Image Generator
Paper • 2411.15466 • Published • 39 
- 
	
	
	
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 137 - 
	
	
	
WebDancer: Towards Autonomous Information Seeking Agency
Paper • 2505.22648 • Published • 33 - 
	
	
	
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 52