XVERSE-13B-2: мощная мультиязычная языковая модель (LLM)

Q: Кто разработал XVERSE-13B-2?

Модель XVERSE-13B-2 разработана компанией XVERSE Technology,Shenzhen Yuanxiang Technology (China,China).

Q: Какие задачи решает XVERSE-13B-2?

Language generation, Генерация текста, Ответы на вопросы, Text summarization, Машинный перевод

// задачи

Language generationГенерация текстаОтветы на вопросыText summarizationМашинный перевод

// описание

XVERSE-13B-2 представляет собой мультиязычную языковую модель с впечатляющим контекстным окном в 8k токенов. Этот ИИ отлично подходит для длинных диалогов, суммаризации текстов и сложного машинного перевода, предлагая высокую точность генерации на уровне топовых решений.

// abstract

XVERSE-13B is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. Its key features are as follows: Model Structure: XVERSE-13B uses the mainstream Decoder-only Transformer network structure, supports 8k context length, the longest one among models of the same size, which can meet the need of longer multi-round dialogues, knowledge question-answering, and summarization. This makes the model more versatile in application scenarios. Training Data: The model has been thoroughly trained on a diversified and high-quality dataset consisting of 3.2 trillion of tokens, including more than 40 languages such as Chinese, English, Russian, and Spanish. The sampling ratio of different types of data is finely set, which makes the performance of Chinese and English excellent, and also takes into account the effect of other languages. Tokenization: Based on the BPE (Byte-Pair Encoding) algorithm, a tokenizer with a vocabulary size of 100,534 has been trained using hundreds of gigabytes of language data. This tokenizer is capable of supporting multilingual without the need for additional vocabulary expansion. Training Framework: Several key technologies have also been independently developed, including efficient operators, memory optimization, parallel scheduling strategies, overlap of data-computation-communication, and synergy between platforms and frameworks. These advancements enhance training efficiency and model stability. With these technologies, the peak computational power utilization rate on a thousand-card cluster can reach 58.5%, ranking at the forefront of the industry.

// faq

Что такое XVERSE-13B-2?+

Кто разработал XVERSE-13B-2?+

Какие задачи решает XVERSE-13B-2?+

// похожие модели