Яндекс Метрика
Генерация изображений

Lumina-Image-2.0

Shanghai AI Lab,University of Sydney,Chinese University of Hong Kong (CUHK),Shanghai Jiao Tong University,Krea AI
Генерация изображенийText-to-image

Lumina-Image 2.0 — это мощный прорыв в генерации изображений по тексту, использующий архитектуру Unified Next-DiT для идеального взаимодействия модальностей. Эта ИИ-модель создает детализированный визуал, плавно объединяя текстовые токены и пиксели в единую последовательность для достижения фотореализма.

We introduce Lumina-Image 2.0, an advanced text-to-image generation framework that achieves significant progress compared to previous work, Lumina-Next. Lumina-Image 2.0 is built upon two key principles: (1) Unification - it adopts a unified architecture (Unified Next-DiT) that treats text and image tokens as a joint sequence, enabling natural cross-modal interactions and allowing seamless task expansion. Besides, since high-quality captioners can provide semantically well-aligned text-image training pairs, we introduce a unified captioning system, Unified Captioner (UniCap), specifically designed for T2I generation tasks. UniCap excels at generating comprehensive and accurate captions, accelerating convergence and enhancing prompt adherence. (2) Efficiency - to improve the efficiency of our proposed model, we develop multi-stage progressive training strategies and introduce inference acceleration techniques without compromising image quality. Extensive evaluations on academic benchmarks and public text-to-image arenas show that Lumina-Image 2.0 delivers strong performances even with only 2.6B parameters, highlighting its scalability and design efficiency. We have released our training details, code, and models at this https URL.

Что такое Lumina-Image-2.0?+
Кто разработал Lumina-Image-2.0?+
Какие задачи решает Lumina-Image-2.0?+