-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
Collections
Discover the best community collections!
Collections including paper arxiv:2510.21618
-
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 132 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 137 -
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Paper • 2508.15144 • Published • 64 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 154
-
ruslanmv/Medical-Llama3-8B
Text Generation • 8B • Updated • 992 • • 104 -
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 40 -
Mykes/medicus
Text Generation • 3B • Updated • 72 • 5 -
DeepAgent: A General Reasoning Agent with Scalable Toolsets
Paper • 2510.21618 • Published • 92
-
Emu3.5: Native Multimodal Models are World Learners
Paper • 2510.26583 • Published • 98 -
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging
Paper • 2510.20479 • Published • 10 -
A Definition of AGI
Paper • 2510.18212 • Published • 33 -
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper • 2510.20888 • Published • 44
-
HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video
Paper • 2510.05560 • Published • 7 -
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Paper • 2510.06217 • Published • 62 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 464 -
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 49
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 144 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Paper • 2409.02795 • Published • 72 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 92
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 84 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
Describe What You See with Multimodal Large Language Models to Enhance Video Recommendations
Paper • 2508.09789 • Published • 5 -
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Paper • 2508.13186 • Published • 18 -
ZARA: Zero-shot Motion Time-Series Analysis via Knowledge and Retrieval Driven LLM Agents
Paper • 2508.04038 • Published • 1 -
Prompt Orchestration Markup Language
Paper • 2508.13948 • Published • 48
-
Emu3.5: Native Multimodal Models are World Learners
Paper • 2510.26583 • Published • 98 -
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging
Paper • 2510.20479 • Published • 10 -
A Definition of AGI
Paper • 2510.18212 • Published • 33 -
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper • 2510.20888 • Published • 44
-
HoloScene: Simulation-Ready Interactive 3D Worlds from a Single Video
Paper • 2510.05560 • Published • 7 -
TaTToo: Tool-Grounded Thinking PRM for Test-Time Scaling in Tabular Reasoning
Paper • 2510.06217 • Published • 62 -
Less is More: Recursive Reasoning with Tiny Networks
Paper • 2510.04871 • Published • 464 -
Fast-dLLM v2: Efficient Block-Diffusion LLM
Paper • 2509.26328 • Published • 49
-
GUI-G^2: Gaussian Reward Modeling for GUI Grounding
Paper • 2507.15846 • Published • 132 -
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent
Paper • 2508.05748 • Published • 137 -
Mobile-Agent-v3: Foundamental Agents for GUI Automation
Paper • 2508.15144 • Published • 64 -
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper • 2508.16153 • Published • 154
-
Writing in the Margins: Better Inference Pattern for Long Context Retrieval
Paper • 2408.14906 • Published • 144 -
Training Language Models to Self-Correct via Reinforcement Learning
Paper • 2409.12917 • Published • 140 -
Towards a Unified View of Preference Learning for Large Language Models: A Survey
Paper • 2409.02795 • Published • 72 -
Attention Heads of Large Language Models: A Survey
Paper • 2409.03752 • Published • 92
-
ruslanmv/Medical-Llama3-8B
Text Generation • 8B • Updated • 992 • • 104 -
Robust Speech Recognition via Large-Scale Weak Supervision
Paper • 2212.04356 • Published • 40 -
Mykes/medicus
Text Generation • 3B • Updated • 72 • 5 -
DeepAgent: A General Reasoning Agent with Scalable Toolsets
Paper • 2510.21618 • Published • 92
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 84 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25