-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
Collections
Discover the best community collections!
Collections including paper arxiv:2511.13720
-
Back to Basics: Let Denoising Generative Models Denoise
Paper • 2511.13720 • Published • 56 -
Virtual Width Networks
Paper • 2511.11238 • Published • 35 -
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Paper • 2511.07419 • Published • 25 -
When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs
Paper • 2511.02243 • Published • 24
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 66 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 123 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85
-
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper • 2506.09790 • Published • 53 -
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance
Paper • 2506.06444 • Published • 73 -
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper • 2506.11763 • Published • 71 -
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Paper • 2502.04644 • Published • 4
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 72 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 115 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 53 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 103
-
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper • 2506.22434 • Published • 10 -
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Paper • 2507.13348 • Published • 75 -
RewardDance: Reward Scaling in Visual Generation
Paper • 2509.08826 • Published • 72 -
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Paper • 2510.18876 • Published • 36
-
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 141 -
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development
Paper • 2506.05010 • Published • 79 -
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
Paper • 2506.05301 • Published • 56 -
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
Paper • 2505.16933 • Published • 34
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Back to Basics: Let Denoising Generative Models Denoise
Paper • 2511.13720 • Published • 56 -
Virtual Width Networks
Paper • 2511.11238 • Published • 35 -
Routing Manifold Alignment Improves Generalization of Mixture-of-Experts LLMs
Paper • 2511.07419 • Published • 25 -
When Modalities Conflict: How Unimodal Reasoning Uncertainty Governs Preference Dynamics in MLLMs
Paper • 2511.02243 • Published • 24
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 72 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 115 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 53 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 103
-
Arbitrary-steps Image Super-resolution via Diffusion Inversion
Paper • 2412.09013 • Published • 13 -
Deep Researcher with Test-Time Diffusion
Paper • 2507.16075 • Published • 66 -
nablaNABLA: Neighborhood Adaptive Block-Level Attention
Paper • 2507.13546 • Published • 123 -
Yume: An Interactive World Generation Model
Paper • 2507.17744 • Published • 85
-
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
Paper • 2506.22434 • Published • 10 -
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning
Paper • 2507.13348 • Published • 75 -
RewardDance: Reward Scaling in Visual Generation
Paper • 2509.08826 • Published • 72 -
Grasp Any Region: Towards Precise, Contextual Pixel Understanding for Multimodal LLMs
Paper • 2510.18876 • Published • 36
-
ComfyUI-R1: Exploring Reasoning Models for Workflow Generation
Paper • 2506.09790 • Published • 53 -
Saffron-1: Towards an Inference Scaling Paradigm for LLM Safety Assurance
Paper • 2506.06444 • Published • 73 -
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper • 2506.11763 • Published • 71 -
Agentic Reasoning: Reasoning LLMs with Tools for the Deep Research
Paper • 2502.04644 • Published • 4
-
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models
Paper • 2505.24864 • Published • 141 -
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development
Paper • 2506.05010 • Published • 79 -
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training
Paper • 2506.05301 • Published • 56 -
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
Paper • 2505.16933 • Published • 34