RFM-1: Мультимодальная ИИ-модель для робототехники

Q: Кто разработал RFM-1?

Модель RFM-1 разработана компанией Covariant (United States of America).

Q: Какие задачи решает RFM-1?

Robotic manipulation, Image captioning, Описание видео

// задачи

Robotic manipulationImage captioningОписание видео

// описание

RFM-1 от Covariant — это мультимодальный ИИ на 8 миллиардов параметров, который буквально дает роботам «мозг». Модель обучалась на текстах, видео и данных с сенсоров, что позволяет ей понимать физический мир и выполнять сложные задачи по манипуляции объектами.

// abstract

What is RFM-1 Set up as a multimodal any-to-any sequence model, RFM-1 is an 8 billion parameter transformer trained on text, images, videos, robot actions, and a range of numerical sensor readings. By tokenizing all modalities into a common space and performing autoregressive next-token prediction, RFM-1 uses its broad range of input and output modalities to enable diverse applications. For example, it can perform image-to-image learning for scene analysis tasks like segmentation and identification. It can combine text instructions with image observations to generate desired grasp actions or motion sequences. It can pair a scene image with a targeted grasp image to predict outcomes as videos or simulate the numerical sensor readings that would occur along the way.

// faq

Что такое RFM-1?+

Кто разработал RFM-1?+

Какие задачи решает RFM-1?+

// похожие модели

π0.7 (pi-0.7)

Physical Intelligence