InternVL1.5: Мультимодальная ИИ-модель нового поколения

Q: Кто разработал InternVL1.5?

Модель InternVL1.5 разработана компанией Shanghai AI Lab,SenseTime,Tsinghua University,Nanjing University,Fudan University,Chinese University of Hong Kong (CUHK) (China,Hong Kong,China,China,China,Hong Kong).

Q: Какие задачи решает InternVL1.5?

Визуальные ответы на вопросы, Image captioning, Детекция объектов, Character recognition (OCR), Генерация текста, Машинный перевод

// задачи

Визуальные ответы на вопросыImage captioningДетекция объектовCharacter recognition (OCR)Генерация текстаМашинный перевод

// описание

InternVL1.5 — мощная мультимодальная ИИ-модель с открытым кодом, которая сокращает разрыв между свободными и коммерческими решениями. Благодаря энкодеру InternViT-6B, она мастерски справляется с OCR, анализом изображений и сложными визуальными ответами.

// abstract

In this report, we introduce InternVL 1.5, an open-source multimodal large language model (MLLM) to bridge the capability gap between open-source and proprietary commercial models in multimodal understanding. We introduce three simple improvements: (1) Strong Vision Encoder: we explored a continuous learning strategy for the large-scale vision foundation model -- InternViT-6B, boosting its visual understanding capabilities, and making it can be transferred and reused in different LLMs. (2) Dynamic High-Resolution: we divide images into tiles ranging from 1 to 40 of 448×448 pixels according to the aspect ratio and resolution of the input images, which supports up to 4K resolution input. (3) High-Quality Bilingual Dataset: we carefully collected a high-quality bilingual dataset that covers common scenes, document images, and annotated them with English and Chinese question-answer pairs, significantly enhancing performance in OCR- and Chinese-related tasks. We evaluate InternVL 1.5 through a series of benchmarks and comparative studies. Compared to both open-source and proprietary models, InternVL 1.5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks. Code has been released at this https URL.

// faq

Что такое InternVL1.5?+

Кто разработал InternVL1.5?+

Какие задачи решает InternVL1.5?+

// похожие модели