DeepSeek Papers

Advancing Open-Source Language Models

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Deep Dive Coming Soon

Released: February 2025

Introduces a new approach to sparse attention that is both hardware-efficient and natively trainable, improving the performance of large language models.

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Deep Dive Coming Soon

Released: January 20, 2025

The R1 model builds on previous work to enhance reasoning capabilities through large-scale reinforcement learning, competing directly with leading models like OpenAI's o1.

DeepSeek-V3 Technical Report Deep Dive Coming Soon

Released: December 2024

Discusses the scaling of sparse MoE networks to 671 billion parameters, utilizing mixed precision training and high-performance computing (HPC) co-design strategies.

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model Deep Dive Coming Soon

Released: May 2024

Introduces a Mixture-of-Experts (MoE) architecture, enhancing performance while reducing training costs by 42%. Emphasizes strong performance characteristics and efficiency improvements.

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Deep Dive Coming Soon

Released: April 2024

This paper presents methods to improve mathematical reasoning in LLMs, introducing the Group Relative Policy Optimization (GRPO) algorithm during reinforcement learning stages.

DeepSeekLLM: Scaling Open-Source Language Models with Longer-termism Deep Dive Coming Soon

Released: November 29, 2023

This foundational paper explores scaling laws and the trade-offs between data and model size, establishing the groundwork for subsequent models.

DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Deep Dive Coming Soon

Focuses on enhancing theorem proving capabilities in language models using synthetic data for training, establishing new benchmarks in automated mathematical reasoning.

DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Deep Dive Coming Soon

This paper details advancements in code-related tasks with an emphasis on open-source methodologies, improving upon earlier coding models with enhanced capabilities.

DeepSeekMoE: Advancing Mixture-of-Experts Architecture Deep Dive Coming Soon

Discusses the integration and benefits of the Mixture-of-Experts approach within the DeepSeek framework, focusing on scalability and efficiency improvements.