Уникальная ИИ-модель DolphinGemma создана специально для анализа и генерации звуков дельфинов прямо на смартфонах Pixel. Благодаря компактному размеру в 400 млн параметров и технологиям Google DeepMind, это решение открывает новые горизонты в биоакустике и распознавании речи животных.
Enter DolphinGemma. Developed by Google, this AI model makes use of specific Google audio technologies: the SoundStream tokenizer efficiently represents dolphin sounds, which are then processed by a model architecture suited for complex sequences. This ~400M parameter model is optimally-sized to run directly on the Pixel phones WDP uses in the field. This model builds upon insights from Gemma, Google’s collection of lightweight, state-of-the-art open models that are built from the same research and technology that powers our Gemini models. Trained extensively on WDP’s acoustic database of wild Atlantic spotted dolphins, DolphinGemma functions as an audio-in, audio-out model, processes sequences of natural dolphin sounds to identify patterns, structure and ultimately predict the likely subsequent sounds in a sequence, much like how large language models for human language predict the next word or token in a sentence. WDP is beginning to deploy DolphinGemma this field season with immediate potential benefits. By identifying recurring sound patterns, clusters and reliable sequences, the model can help researchers uncover hidden structures and potential meanings within the dolphins' natural communication — a task previously requiring immense human effort. Eventually, these patterns, augmented with synthetic sounds created by the researchers to refer to objects with which the dolphins like to play, may establish a shared vocabulary with the dolphins for interactive communication.