XVERSE-13B-2 представляет собой мультиязычную языковую модель с впечатляющим контекстным окном в 8k токенов. Этот ИИ отлично подходит для длинных диалогов, суммаризации текстов и сложного машинного перевода, предлагая высокую точность генерации на уровне топовых решений.
XVERSE-13B is a multilingual large language model, independently developed by Shenzhen Yuanxiang Technology. Its key features are as follows: Model Structure: XVERSE-13B uses the mainstream Decoder-only Transformer network structure, supports 8k context length, the longest one among models of the same size, which can meet the need of longer multi-round dialogues, knowledge question-answering, and summarization. This makes the model more versatile in application scenarios. Training Data: The model has been thoroughly trained on a diversified and high-quality dataset consisting of 3.2 trillion of tokens, including more than 40 languages such as Chinese, English, Russian, and Spanish. The sampling ratio of different types of data is finely set, which makes the performance of Chinese and English excellent, and also takes into account the effect of other languages. Tokenization: Based on the BPE (Byte-Pair Encoding) algorithm, a tokenizer with a vocabulary size of 100,534 has been trained using hundreds of gigabytes of language data. This tokenizer is capable of supporting multilingual without the need for additional vocabulary expansion. Training Framework: Several key technologies have also been independently developed, including efficient operators, memory optimization, parallel scheduling strategies, overlap of data-computation-communication, and synergy between platforms and frameworks. These advancements enhance training efficiency and model stability. With these technologies, the peak computational power utilization rate on a thousand-card cluster can reach 58.5%, ranking at the forefront of the industry.