Transformer + Average Attention Network: обзор ИИ-модели

Q: Кто разработал Transformer + Average Attention Network?

Модель Transformer + Average Attention Network разработана компанией University of Electronic Science and Technology of China (China).

// задачи

Языковое моделирование

// описание

Эта языковая модель переосмысляет классическую архитектуру Transformer, делая ставку на максимальную эффективность обучения. Сочетая механизмы self-attention и полносвязные сети, AI-решение от китайских ученых демонстрирует впечатляющую скорость обработки текстов без потери качества.

// abstract

To date, the main method of language modeling is based on recurrent neural networks or convolutional neural networks. We show that two simple models which get inspiration from Transformer [1]. Compare to other attention network, the Transformer which just use self-attention and FFN (feedforward network) are highly efficient in training. We apply the idea which from the mechanism of Transformer to language module. We predict the future elements by the long-term dependence of context words which through the ANN (average attention network) and self-attention mechanism. We test our model on WikiText-103 where our model achieves 22.13 perplexity and on the Google Billion Word benchmark, which we achieve 26.31 perplexity.

// faq

Что такое Transformer + Average Attention Network?+

Кто разработал Transformer + Average Attention Network?+

Какие задачи решает Transformer + Average Attention Network?+

// похожие модели