Яндекс Метрика
Распознавание речи

BaiLing TTS

Ant Group
Text-to-speech (TTS)

BaiLing TTS — это инновационная ИИ-система от Ant Group, созданная для качественного синтеза речи на китайских диалектах. Модель справляется со сложными интонациями и региональными особенностями, которые раньше были недоступны обычным TTS-сервисам.

Large-scale text-to-speech (TTS) models have made significant progress this http URL, they still fall short in the generation of Chinese dialectal speech. Toaddress this, we propose Bailing-TTS, a family of large-scale TTS models capable of generating high-quality Chinese dialectal speech. Bailing-TTS serves as a foundation model for Chinese dialectal speech generation. First, continual semi-supervised learning is proposed to facilitate the alignment of text tokens and speech tokens. Second, the Chinese dialectal representation learning is developed using a specific transformer architecture and multi-stage training processes. With the proposed design of novel network architecture and corresponding strategy, Bailing-TTS is able to generate Chinese dialectal speech from text effectively and efficiently. Experiments demonstrate that Bailing-TTS generates Chinese dialectal speech towards human-like spontaneous representation. Readers are encouraged to listen to demos at \url{this https URL}.

Что такое BaiLing TTS?+
Кто разработал BaiLing TTS?+
Какие задачи решает BaiLing TTS?+