Fuyu-Heavy: мультимодальный ИИ для цифровых агентов

Q: Кто разработал Fuyu-Heavy?

Модель Fuyu-Heavy разработана компанией Adept (United States of America).

Q: Какие задачи решает Fuyu-Heavy?

Чат-бот, Генерация текста, Визуальные ответы на вопросы, Управление системами

// задачи

Чат-ботГенерация текстаВизуальные ответы на вопросыУправление системами

// описание

Fuyu-Heavy от Adept — мультимодальный гигант, созданный специально для управления цифровыми агентами и понимания пользовательских интерфейсов. Несмотря на компактность, этот ИИ уступает в визуальном анализе лишь лидерам рынка вроде GPT-4V, мастерски справляясь с чтением UI-элементов.

// abstract

We’re excited to introduce Adept Fuyu-Heavy, a new multimodal model designed specifically for digital agents. Fuyu-Heavy is the world’s third-most-capable multimodal model, behind only GPT4-V and Gemini Ultra, which are 10-20 times bigger. We’re excited about this model because: It excels at multimodal reasoning. To us the killer feature is UI understanding, but it also performs well on more traditional multimodal benchmarks. In particular, Fuyu-Heavy scores higher on the MMMU benchmark than even Gemini Pro. On standard text-based benchmarks, it matches or exceeds the performance of models in the same compute class despite having to devote some of its capacity to image modeling. It demonstrates that (with some modifications) we can scale up the Fuyu architecture and reap all of the associated benefits, including handling arbitrary size/shape images and efficiently re-using existing transformer optimizations.

// faq

Что такое Fuyu-Heavy?+

Кто разработал Fuyu-Heavy?+

Какие задачи решает Fuyu-Heavy?+

// похожие модели