Мультимодальная модель Palmyra Vision мастерски объединяет зрение и текст, легко справляясь с анализом графиков и распознаванием рукописного ввода. Этот AI-инструмент показывает впечатляющие результаты в визуальных ответах на вопросы, обходя многие популярные аналоги.
Palmyra Vision is a multimodal large language model (LLM) with vision capabilities developed by Writer that can analyze and generate text based on images. It excels in tasks such as extracting handwritten text, classifying objects, analyzing graphs and charts, and answering specific questions based on visual inputs. Palmyra Vision achieved a score of 84.4% on VQAv2 benchmark, outperforming other prominent multimodal models. Palmyra Vision offers a range of practical applications in the enterprise, including product description generation, interpreting charts and graphs, compliance detection, improving accessibility by creating ALT descriptions, and text extraction from handwritten reports.