Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arXiv:2510.25741

Reasoning Papers

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published Aug 11 • 41
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Paper • 2508.07101 • Published Aug 9 • 13
Compressing Chain-of-Thought in LLMs via Step Entropy

Paper • 2508.03346 • Published Aug 5 • 7
Train Long, Think Short: Curriculum Learning for Efficient Reasoning

Paper • 2508.08940 • Published Aug 12 • 26

Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published 11 days ago • 202
ByteDance/Ouro-1.4B

Text Generation • Updated 9 days ago • 2.99k • 41
ByteDance/Ouro-2.6B

Text Generation • Updated 9 days ago • 700 • 37
ByteDance/Ouro-1.4B-Thinking

Text Generation • Updated 9 days ago • 398 • 18

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30 • 526
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 468
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Paper • 2508.18106 • Published Aug 25 • 342
Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21 • 255

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14 • 28
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published Aug 20 • 42
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 154

ByteDance Papers

ByteDance papers collection

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

about 21 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 216 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published 10 days ago • 102
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging

Paper • 2510.20479 • Published 17 days ago • 10
A Definition of AGI

Paper • 2510.18212 • Published 20 days ago • 33
Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published 17 days ago • 44

Language Models Model Language

Paper • 2510.12766 • Published 26 days ago • 23
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30 • 526
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 468
Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published 11 days ago • 202

GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 132
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 137
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 154

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Paper • 2504.02821 • Published Apr 3 • 9
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Paper • 2504.17343 • Published Apr 24 • 13
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting

Paper • 2504.15921 • Published Apr 22 • 7
Causal-Copilot: An Autonomous Causal Analysis Agent

Paper • 2504.13263 • Published Apr 17 • 7

Reasoning Papers

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published Aug 11 • 41
Less Is More: Training-Free Sparse Attention with Global Locality for Efficient Reasoning

Paper • 2508.07101 • Published Aug 9 • 13
Compressing Chain-of-Thought in LLMs via Step Entropy

Paper • 2508.03346 • Published Aug 5 • 7
Train Long, Think Short: Curriculum Learning for Efficient Reasoning

Paper • 2508.08940 • Published Aug 12 • 26

about 21 hours ago

lusxvr/nanoVLM-222M

Image-Text-to-Text • 0.2B • Updated May 8 • 216 • 98
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

Paper • 2503.09516 • Published Mar 12 • 36
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

Paper • 2505.24863 • Published May 30 • 97
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning

Paper • 2505.17667 • Published May 23 • 88

Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published 11 days ago • 202
ByteDance/Ouro-1.4B

Text Generation • Updated 9 days ago • 2.99k • 41
ByteDance/Ouro-2.6B

Text Generation • Updated 9 days ago • 700 • 37
ByteDance/Ouro-1.4B-Thinking

Text Generation • Updated 9 days ago • 398 • 18

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published 10 days ago • 102
RECALL: REpresentation-aligned Catastrophic-forgetting ALLeviation via Hierarchical Model Merging

Paper • 2510.20479 • Published 17 days ago • 10
A Definition of AGI

Paper • 2510.18212 • Published 20 days ago • 33
Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published 17 days ago • 44

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30 • 526
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 468
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code

Paper • 2508.18106 • Published Aug 25 • 342
Intern-S1: A Scientific Multimodal Foundation Model

Paper • 2508.15763 • Published Aug 21 • 255

Language Models Model Language

Paper • 2510.12766 • Published 26 days ago • 23
The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30 • 526
Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6 • 468
Scaling Latent Reasoning via Looped Language Models

Paper • 2510.25741 • Published 11 days ago • 202

Pass@k Training for Adaptively Balancing Exploration and Exploitation of Large Reasoning Models

Paper • 2508.10751 • Published Aug 14 • 28
Reinforcement Pre-Training

Paper • 2506.08007 • Published Jun 9 • 262
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers

Paper • 2508.14704 • Published Aug 20 • 42
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 154

GUI-G^2: Gaussian Reward Modeling for GUI Grounding

Paper • 2507.15846 • Published Jul 21 • 132
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

Paper • 2508.05748 • Published Aug 7 • 137
Mobile-Agent-v3: Foundamental Agents for GUI Automation

Paper • 2508.15144 • Published Aug 21 • 64
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs

Paper • 2508.16153 • Published Aug 22 • 154

ByteDance Papers

ByteDance papers collection

Contrastive Learning for Many-to-many Multilingual Neural Machine Translation

Paper • 2105.09501 • Published May 20, 2021
Cross-modal Contrastive Learning for Speech Translation

Paper • 2205.02444 • Published May 5, 2022
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs

Paper • 2210.03052 • Published Oct 6, 2022
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning

Paper • 2212.10240 • Published Dec 20, 2022 • 1

Sparse Autoencoders Learn Monosemantic Features in Vision-Language Models

Paper • 2504.02821 • Published Apr 3 • 9
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos

Paper • 2504.17343 • Published Apr 24 • 13
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting

Paper • 2504.15921 • Published Apr 22 • 7
Causal-Copilot: An Autonomous Causal Analysis Agent

Paper • 2504.13263 • Published Apr 17 • 7

Previous
1
2
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs