Ультракомпактная open-source модель от IBM, созданная специально для оцифровки документов с полным сохранением разметки, таблиц и формул. Благодаря архитектуре VLM, Granite-Docling превращает сложные PDF в машиночитаемый формат с невероятной точностью и скоростью.
Today, IBM is releasing Granite-Docling-258M, an ultra-compact and cutting-edge open-source vision-language model (VLM) for converting documents to machine-readable formats while fully preserving their layout, tables, equations, lists and more. It’s now available on Hugging Face through a standard Apache 2.0 license. Granite-Docling is purpose-built for accurate and efficient document conversion, unlike most VLM-based approaches to optical character recognition (OCR) that aim to adapt large, general-purpose models to the task. Even at an ultra-compact 258M parameters, Granite-Docling’s capabilities rival those of systems several times its size, making it extremely cost-effective. The model goes well beyond mere text extraction: it handles both inline and floating math and code, excels at recognizing table structure and preserves the layout and structure of the original document. Whereas conventional OCR models convert documents directly to Markdown and lose connection to the source content, Granite-Docling’s unique method of faithfully translating complex structural elements makes its output ideal for downstream RAG applications.