GPT2-LayerFusion-WS: эффективное сжатие ИИ-моделей

Q: Кто разработал GPT2-LayerFusion-WS?

Модель GPT2-LayerFusion-WS разработана компанией University of Liverpool,University of Southern California (United Kingdom of Great Britain and Northern Ireland,United States of America).

// задачи

Языковое моделирование

// описание

GPT2-LayerFusion-WS — это продвинутый метод сжатия нейросетей через «слияние» похожих слоев. Технология позволяет значительно уменьшить размер ИИ-модели без потери точности, делая запуск тяжелых алгоритмов быстрее и экономнее.

// abstract

This paper proposes \textit{layer fusion} - a model compression technique that discovers which weights to combine and then fuses weights of similar fully-connected, convolutional and attention layers. Layer fusion can significantly reduce the number of layers of the original network with little additional computation overhead, while maintaining competitive performance. From experiments on CIFAR-10, we find that various deep convolution neural networks can remain within 2\% accuracy points of the original networks up to a compression ratio of 3.33 when iteratively retrained with layer fusion. For experiments on the WikiText-2 language modelling dataset where pretrained transformer models are used, we achieve compression that leads to a network that is 20\% of its original size while being within 5 perplexity points of the original network. We also find that other well-established compression techniques can achieve competitive performance when compared to their original networks given a sufficient number of retraining steps. Generally, we observe a clear inflection point in performance as the amount of compression increases, suggesting a bound on the amount of compression that can be achieved before an exponential degradation in performance.

// faq

Что такое GPT2-LayerFusion-WS?+

Кто разработал GPT2-LayerFusion-WS?+

Какие задачи решает GPT2-LayerFusion-WS?+

// похожие модели