Архитектура QRNN объединяет лучшее от рекуррентных и сверточных сетей для сверхбыстрого языкового моделирования. Этот ИИ-подход решает проблему медленного обучения LSTM, позволяя обрабатывать огромные текстовые корпуса с высокой эффективностью.
Word-level language modeling (WLM) is one the foundational tasks of unsupervised natural language processing. Most modern architectures for WLM use several LSTM layers, followed by a softmax layer. Even with larger batch sizes and a multi-GPU setup, training of these networks on large-vocabulary corpora is slow due to increased computation involving the softmax and the high cost of recurrence computation. We propose a model architecture and training strategy that enables us to achieve state-of-the-art performance on the WikiText-103 data set using a single GPU while being substantially faster than an NVIDIA cuDNN LSTM-based model by utilizing the Quasi-Recurrent Neural Network (QRNN), an adaptive softmax with weight tying, and longer sequences within batches.