E2 TTS: Революционный синтез речи от Microsoft

Q: Кто разработал E2 TTS?

Модель E2 TTS разработана компанией Microsoft (United States of America).

Q: Какие задачи решает E2 TTS?

Text-to-speech (TTS), Speech synthesis

// задачи

Text-to-speech (TTS)Speech synthesis

// описание

Microsoft представила E2 TTS — систему синтеза речи, которая достигает человеческого уровня естественности всего по короткому образцу голоса. Благодаря неавторегрессионной архитектуре, этот ИИ мгновенно клонирует голос, сохраняя идеальную дикцию и эмоциональную окраску.

// abstract

This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-autoregressive zero-shot text-to-speech system that offers human-level naturalness and state-of-the-art speaker similarity and intelligibility. In the E2 TTS framework, the text input is converted into a character sequence with filler tokens. The flow-matching-based mel spectrogram generator is then trained based on the audio infilling task. Unlike many previous works, it does not require additional components (e.g., duration model, grapheme-to-phoneme) or complex techniques (e.g., monotonic alignment search). Despite its simplicity, E2 TTS achieves state-of-the-art zero-shot TTS capabilities that are comparable to or surpass previous works, including Voicebox and NaturalSpeech 3. The simplicity of E2 TTS also allows for flexibility in the input representation. We propose several variants of E2 TTS to improve usability during inference. See this https URL for demo samples.

// faq

Что такое E2 TTS?+

Кто разработал E2 TTS?+

Какие задачи решает E2 TTS?+

// похожие модели

Emu3.5

Beijing Academy of Artificial Intelligence / BAAI

34.1B

Gemini 2.5 Computer Use

Google

Octave 2

Hume