Tensorized Transformer (W103) — ИИ для WikiText-103

Q: Кто разработал Tensorized Transformer (W103)?

Модель Tensorized Transformer (W103) разработана компанией Tianjin University,Microsoft Research Asia,Beijing Institute of Technology (China,China,China).

Q: Какие задачи решает Tensorized Transformer (W103)?

Языковое моделирование, Машинный перевод

// задачи

Языковое моделированиеМашинный перевод

// описание

Модель Tensorized Transformer, оптимизированная для работы с масштабным набором данных WikiText-103. Этот ИИ демонстрирует прорыв в области NLP, обеспечивая качественный машинный перевод и генерацию связных текстов при экономном использовании видеопамяти.

// abstract

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a resource-limited setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, WikiText-103 and One-billion) and a neural machine translation task (i.e., WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition.

// faq

Что такое Tensorized Transformer (W103)?+

Кто разработал Tensorized Transformer (W103)?+

Какие задачи решает Tensorized Transformer (W103)?+

// похожие модели