Eleven Scribe от ElevenLabs — это SOTA-модель для распознавания речи, поддерживающая 99 языков с феноменальной точностью. Нейросеть не просто переводит аудио в текст, но и умеет разделять спикеров, расставлять таймстампы и игнорировать фоновые шумы реального мира.
Scribe, our first Speech to Text model, is the world’s most accurate transcription model. Built to handle the unpredictability of real-world audio, Scribe transcribes speech in 99 languages, featuring word-level timestamps, speaker diarization, and audio-event tagging—all delivered in a structured response for seamless integration. Scribe is engineered for precision. In FLEURS & Common Voice benchmark tests across 99 languages, it consistently outperforms leading models like Gemini 2.0 Flash, Whisper Large V3 and Deepgram Nova-3. Whether it’s meeting summaries, movie subtitles, or even song lyrics, Scribe delivers the lowest automated transcription word error rate in Italian (98.7%), English (96.7%) and 97 other languages. Scribe makes ASR universally accessible—dramatically reducing errors in traditionally underserved languages such as Serbian, Cantonese, and Malayalam, where competing models often exceed 40% word error rates.