HAM-TTS: Реалистичный синтез речи и TTS нового поколения

Q: Кто разработал HAM-TTS?

Модель HAM-TTS разработана компанией Geely Automobile Research Institute (Ningbo) Company,National Institute of Informatics,Shanghai Jiao Tong University (China,Japan,China).

Q: Какие задачи решает HAM-TTS?

Text-to-speech (TTS), Speech synthesis

// задачи

Text-to-speech (TTS)Speech synthesis

// описание

HAM-TTS — это инновационная система синтеза речи, использующая иерархическое акустическое моделирование. Этот ИИ решает проблему неестественных интонаций, создавая максимально живой и стабильный голос, который практически неотличим от человеческого.

// abstract

Token-based text-to-speech (TTS) models have emerged as a promising avenue for generating natural and realistic speech, yet they grapple with low pronunciation accuracy, speaking style and timbre inconsistency, and a substantial need for diverse training data. In response, we introduce a novel hierarchical acoustic modeling approach complemented by a tailored data augmentation strategy and train it on the combination of real and synthetic data, scaling the data size up to 650k hours, leading to the zero-shot TTS model with 0.8B parameters. Specifically, our method incorporates a latent variable sequence containing supplementary acoustic information based on refined self-supervised learning (SSL) discrete units into the TTS model by a predictor. This significantly mitigates pronunciation errors and style mutations in synthesized speech. During training, we strategically replace and duplicate segments of the data to enhance timbre uniformity. Moreover, a pretrained few-shot voice conversion model is utilized to generate a plethora of voices with identical content yet varied timbres. This facilitates the explicit learning of utterance-level one-to-many mappings, enriching speech diversity and also ensuring consistency in timbre. Comparative experiments (Demo page: this https URL our model's superiority over VALL-E in pronunciation precision and maintaining speaking style, as well as timbre continuity.

// faq

Что такое HAM-TTS?+

Кто разработал HAM-TTS?+

Какие задачи решает HAM-TTS?+

// похожие модели

Emu3.5

Beijing Academy of Artificial Intelligence / BAAI

34.1B

Gemini 2.5 Computer Use

Google

Octave 2

Hume