CPM-Ant представляет собой открытую языковую модель на 10 миллиардов параметров, оптимизированную для работы с китайским языком. Разработка BAAI и Университета Цинхуа выделяется своей энергоэффективностью и высокой производительностью при генерации текстов. Этот ИИ доказывает, что мощные нейросети могут быть экологичными и доступными.
CPM-Ant is an open-source Chinese pre-trained language model (PLM) with 10B parameters. It is also the first milestone of the live training process of CPM-Live. The training process is cost-effective and environment-friendly. CPM-Ant also achieves promising results with delta tuning on the CUGE benchmark. Besides the full model, we also provide various compressed versions to meet the requirements of different hardware configurations. The code, log files, and checkpoints of CPM-Ant are available under an open license. More specifically, CPM-Ant is: Efficient: BMTrain enables us to take full advantage of distributed computing power to efficiently train big models. The training of CPM-Ant lasts 68 days and costs 430K RMB, which is much lower than the cost of existing model training practices. The greenhouse gas (GHG) emissions of training CPM-Ant are about 4872kg CO2e, while the emissions of training T5-11B are 46.7t CO2e. Effective: OpenDelta enables us to adapt CPM-Ant to downstream tasks through delta tuning. In our experiments, by only tuning 6.3 million parameters, CPM-Ant has achieved the best performance on the 3/6 tasks in the CUGE benchmark, outperforming those baselines (CPM2 with 11B parameters and Yuan 1.0 with 245B parameters) that tune all parameters. Economical: BMCook & BMInf enable us to drive CPM-Ant with limited computing resources. Based on BMInf, we can efficiently perform big model inference using a single GPU (even a consumer-level GPU like GTX 1060) instead of computing clusters. To make the deployment of CPM-Ant more economical, we use BMCook to further compress the original 10B CPM-Ant into multiple versions. These compressed checkpoints (7B, 3B, 1B, 300M) can meet the requirements of various low-resource scenarios. Easy-to-Use: For both the original 10B model and its compressed versions, they can be loaded and run with only a few lines of code. We will integrate CPM-Ant into ModelCenter soon, making further development on our model easier