Яндекс Метрика
Языковая модель

Selfish-RNN (AWD-LSTM-MoS)

Eindhoven University of Technology
Языковое моделирование

Selfish-RNN — это оптимизированная языковая модель, использующая метод динамического разреженного обучения (DST) для снижения нагрузки на вычислительные системы. Этот AI-алгоритм позволяет эффективно тренировать нейросети без необходимости предварительного обучения тяжелых «плотных» моделей, сохраняя высокую точность.

Sparse neural networks have been widely applied to reduce the computational demands of training and deploying over-parameterized deep neural networks. For inference acceleration, methods that discover a sparse network from a pre-trained dense network (dense-to-sparse training) work effectively. Recently, dynamic sparse training (DST) has been proposed to train sparse neural networks without pre-training a dense model (sparse-to-sparse training), so that the training process can also be accelerated. However, previous sparse-to-sparse methods mainly focus on Multilayer Perceptron Networks (MLPs) and Convolutional Neural Networks (CNNs), failing to match the performance of dense-to-sparse methods in the Recurrent Neural Networks (RNNs) setting. In this paper, we propose an approach to train intrinsically sparse RNNs with a fixed parameter count in one single run, without compromising performance. During training, we allow RNN layers to have a non-uniform redistribution across cell gates for better regularization. Further, we propose SNT-ASGD, a novel variant of the averaged stochastic gradient optimizer, which significantly improves the performance of all sparse training methods for RNNs. Using these strategies, we achieve state-of-the-art sparse training results, better than the dense-to-sparse methods, with various types of RNNs on Penn TreeBank and Wikitext-2 datasets. Our codes are available at this https URL.

Что такое Selfish-RNN (AWD-LSTM-MoS)?+
Кто разработал Selfish-RNN (AWD-LSTM-MoS)?+
Какие задачи решает Selfish-RNN (AWD-LSTM-MoS)?+