Яндекс Метрика
Языковая модель

Transformer + Average Attention Network

University of Electronic Science and Technology of China
Языковое моделирование

Эта языковая модель переосмысляет классическую архитектуру Transformer, делая ставку на максимальную эффективность обучения. Сочетая механизмы self-attention и полносвязные сети, AI-решение от китайских ученых демонстрирует впечатляющую скорость обработки текстов без потери качества.

To date, the main method of language modeling is based on recurrent neural networks or convolutional neural networks. We show that two simple models which get inspiration from Transformer [1]. Compare to other attention network, the Transformer which just use self-attention and FFN (feedforward network) are highly efficient in training. We apply the idea which from the mechanism of Transformer to language module. We predict the future elements by the long-term dependence of context words which through the ANN (average attention network) and self-attention mechanism. We test our model on WikiText-103 where our model achieves 22.13 perplexity and on the Google Billion Word benchmark, which we achieve 26.31 perplexity.

Что такое Transformer + Average Attention Network?+
Кто разработал Transformer + Average Attention Network?+
Какие задачи решает Transformer + Average Attention Network?+