Bielik-11B-v2 — это продвинутая польская языковая модель на 11 миллиардов параметров, построенная на базе Mistral-7B. Благодаря обучению на 400 миллиардах токенов, этот ИИ демонстрирует высокую точность в генерации текста и глубокое понимание контекста.
Bielik-11B-v2 is a generative text model featuring 11 billion parameters. It is initialized from its predecessor, Mistral-7B-v0.2, and trained on 400 billion tokens. The aforementioned model stands as a testament to the unique collaboration between the open-science/open-source project SpeakLeash and the High Performance Computing (HPC) center: ACK Cyfronet AGH. Developed and trained on Polish text corpora, which have been cherry-picked and processed by the SpeakLeash team, this endeavor leverages Polish large-scale computing infrastructure, specifically within the PLGrid environment, and more precisely, the HPC center: ACK Cyfronet AGH. The creation and training of the Bielik-11B-v2 was propelled by the support of computational grant number PLG/2024/016951, conducted on the Athena and Helios supercomputer, enabling the use of cutting-edge technology and computational resources essential for large-scale machine learning processes. As a result, the model exhibits an exceptional ability to understand and process the Polish language, providing accurate responses and performing a variety of linguistic tasks with high precision.