LLaDA: диффузионная языковая модель нового поколения

Q: Кто разработал LLaDA?

Модель LLaDA разработана компанией Renmin University of China,Ant Group (China,China).

Q: Какие задачи решает LLaDA?

Генерация кода, Генерация текста

// задачи

Генерация кодаГенерация текста

// описание

Проект LLaDA бросает вызов доминированию авторегрессионных моделей, предлагая инновационный подход на базе диффузионных алгоритмов. Этот ИИ обучается через процесс маскирования данных, демонстрируя новый взгляд на эффективную генерацию текста и программного кода.

// abstract

Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised finetuning (SFT) paradigm. LLaDA models distributions through a forward data masking process and a reverse process, parameterized by a vanilla Transformer to predict masked tokens. By optimizing a likelihood bound, it provides a principled generative approach for probabilistic inference. Across extensive benchmarks, LLaDA demonstrates strong scalability, outperforming our self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue. Moreover, LLaDA addresses the reversal curse, surpassing GPT-4o in a reversal poem completion task. Our findings establish diffusion models as a viable and promising alternative to ARMs, challenging the assumption that key LLMcapabilities discussed above are inherently tied to ARMs. Project page and codes: https: //ml-gsai.github.io/LLaDA-demo/.

// faq

Что такое LLaDA?+

Кто разработал LLaDA?+

Какие задачи решает LLaDA?+

// похожие модели