DeepSeek-V4-Flash: быстрая языковая модель с MoE

Q: Кто разработал DeepSeek-V4-Flash?

Модель DeepSeek-V4-Flash разработана компанией DeepSeek (China).

Q: Какие задачи решает DeepSeek-V4-Flash?

Генерация текста, Ответы на вопросы

// задачи

Генерация текстаОтветы на вопросы

// описание

DeepSeek-V4-Flash — это скоростная версия новой линейки ИИ, оптимизированная для мгновенной генерации текста и ответов на вопросы. Модель сочетает в себе компактность (всего 13 млрд активных параметров) и невероятную мощь, поддерживая работу с огромным контекстом в миллион токенов без потери производительности.

// abstract

We present a preview version of DeepSeek-V4 series, including two strong Mixture-ofExperts (MoE) language models — DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) — both supporting a context length of one million tokens. DeepSeek-V4 series incorporate several key upgrades in architecture and optimization: (1) a hybrid attention architecture that combines Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to improve long-context efficiency; (2) ManifoldConstrained Hyper-Connections (mHC) that enhance conventional residual connections; (3) and the Muon optimizer for faster convergence and greater training stability. We pre-train both models on more than 32T diverse and high-quality tokens, followed by a comprehensive post-training pipeline that unlocks and further enhances their capabilities. DeepSeek-V4-ProMax, the maximum reasoning effort mode of DeepSeek-V4-Pro, redefines the state-of-the-art for open models, outperforming its predecessors in core tasks. Meanwhile, DeepSeek-V4 series are highly efficient in long-context scenarios. In the one-million-token context setting, DeepSeekV4-Pro requires only 27% of single-token inference FLOPs and 10% of KV cache compared with DeepSeek-V3.2. This enables us to routinely support one-million-token contexts, thereby making long-horizon tasks and further test-time scaling more feasible. The model checkpoints are available at https://huggingface.co/collections/deepseek-ai/deepseek-v4.

// faq

Что такое DeepSeek-V4-Flash?+

Кто разработал DeepSeek-V4-Flash?+

Какие задачи решает DeepSeek-V4-Flash?+

// похожие модели