
DeepSeekAI.com
DeepSeek LLM
DeepSeek LLM is an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens.
DeepSeek Coder
DeepSeek Coder is composed of a series of code language models, each trained from scratch on 2T tokens (87% code and 13% natural language).
DeepSeek Math
DeepSeekMath continues pre-training on math-related tokens sourced from Common Crawl, natural language, and code data for 500B tokens.
DeepSeek VL
DeepSeek-VL is an open-source Vision-Language (VL) Model designed for real-world vision and language understanding applications.
DeepSeek V2
DeepSeek-V2 is a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference.
DeepSeek V3
DeepSeek-V3 is a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.


DeepSeek LLM
An advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens.
Superior General Capabilities: DeepSeek LLM 67B Base outperforms Llama2 70B Base in areas such as reasoning, coding, math, and Chinese comprehension.
Proficient in Coding and Math: DeepSeek LLM 67B Chat exhibits outstanding performance in coding (HumanEval Pass@1: 73.78) and mathematics (GSM8K 0-shot: 84.1, Math 0-shot: 32.6).
Mastery in Chinese Language: DeepSeek LLM 67B Chat surpasses GPT-3.5 in Chinese.

DeepSeek is an open-source large language model that relies on inference-time computing. DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance.