Яндекс Метрика
Языковая модель

Nemotron-H 47B

NVIDIA
Генерация текстаОтветы на вопросыМашинный переводКоличественные рассужденияГенерация кодаNeural Architecture Search - NAS

Продвинутая языковая модель от NVIDIA, использующая поиск нейронных архитектур (NAS) для достижения идеального баланса точности и скорости. Благодаря гибридной структуре, этот ИИ обеспечивает глубокие рассуждения и качественный машинный перевод, оставаясь экономически выгодным в эксплуатации.

As inference-time scaling becomes critical for enhanced reasoning capabilities, it is increasingly becoming important to build models that are efficient to infer. We introduce Nemotron-H, a family of 8B and 56B/47B hybrid Mamba-Transformer models designed to reduce inference cost for a given accuracy level. To achieve this goal, we replace the majority of self-attention layers in the common Transformer model architecture with Mamba layers that perform constant computation and require constant memory per generated token. We show that Nemotron-H models offer either better or on-par accuracy compared to other similarly-sized state-of-the-art open-sourced Transformer models (e.g., Qwen-2.5-7B/72B and Llama-3.1-8B/70B), while being up to 3× faster at inference. To further increase inference speed and reduce the memory required at inference time, we created Nemotron-H-47B-Base from the 56B model using a new compression via pruning and distillation technique called MiniPuzzle. Nemotron-H-47B-Base achieves similar accuracy to the 56B model, but is 20% faster to infer. In addition, we introduce an FP8-based training recipe and show that it can achieve on par results with BF16-based training. This recipe is used to train the 56B model. We are releasing Nemotron-H base model checkpoints with support in Hugging Face and NeMo.

Что такое Nemotron-H 47B?+
Кто разработал Nemotron-H 47B?+
Какие задачи решает Nemotron-H 47B?+