Яндекс Метрика
Робототехника, Компьютерное зрение, Распознавание речи

Gemini Robotics-ER

Google DeepMind
Следование инструкциямRobotic manipulationРаспознавание речиObject recognitionДетекция объектов

Gemini Robotics-ER — это специализированная ИИ-модель для робототехники, ориентированная на «воплощенный интеллект» и пространственное мышление. Она позволяет роботам лучше понимать физический мир и связывать высокоуровневые инструкции с реальными действиями манипуляторов.

Alongside Gemini Robotics, we’re introducing an advanced vision-language model called Gemini Robotics-ER (short for ‘“embodied reasoning”). This model enhances Gemini’s understanding of the world in ways necessary for robotics, focusing especially on spatial reasoning, and allows roboticists to connect it with their existing low level controllers. Gemini Robotics-ER improves Gemini 2.0’s existing abilities like pointing and 3D detection by a large margin. Combining spatial reasoning and Gemini’s coding abilities, Gemini Robotics-ER can instantiate entirely new capabilities on the fly. For example, when shown a coffee mug, the model can intuit an appropriate two-finger grasp for picking it up by the handle and a safe trajectory for approaching it. Gemini Robotics-ER can perform all the steps necessary to control a robot right out of the box, including perception, state estimation, spatial understanding, planning and code generation. In such an end-to-end setting the model achieves a 2x-3x success rate compared to Gemini 2.0. And where code generation is not sufficient, Gemini Robotics-ER can even tap into the power of in-context learning, following the patterns of a handful of human demonstrations to provide a solution.

Что такое Gemini Robotics-ER?+
Кто разработал Gemini Robotics-ER?+
Какие задачи решает Gemini Robotics-ER?+