Яндекс Метрика
Языковая модель

GPT-SW3

AI Sweden,RISE
Языковое моделирование

GPT-SW3 — первая крупномасштабная генеративная языковая модель, созданная специально для северогерманских языков. Этот ИИ демонстрирует впечатляющие способности к Zero-shot обучению, решая текстовые задачи без предварительных примеров. Модель является важным шагом в развитии суверенных ИИ-технологий для Европы, обеспечивая высокое качество генерации контента.

Large-scale generative language models such as the GPT series (Radford and Narasimhan, 2018; Radford et al., 2019; Brown et al., 2020) have enjoyed considerable attention in recent years. This has been partly due to their unprecedented ability to generate coherent text, but also for their capacity for zero-shot performance - without any training examples, on a wide range of different tasks. A prerequisite for building such models is access to both large amounts of high-quality text data and powerful computational resources. This has proven to be a limiting factor for the development of large-scale models for languages other than English. With the goal of promoting the development of largescale generative models for other languages, we here present our work on developing and evaluating GPTSW3, a 3.5 billion parameter autoregressive language model, trained on a newly collected 100 GB Swedish corpus. To the best of our knowledge, this is the largest generative model for Swedish to date, and probably one of the bigger non-English models at the moment. In this paper, we collect the lessons learned by developing and evaluating this model, including challenges with data collection, training procedures, and validation activities.

Что такое GPT-SW3?+
Кто разработал GPT-SW3?+
Какие задачи решает GPT-SW3?+