Anthropic LM 175B: языковая модель с самокоррекцией

Q: Кто разработал Anthropic LM 175B?

Модель Anthropic LM 175B разработана компанией Anthropic (United States of America).

Q: Какие задачи решает Anthropic LM 175B?

Генерация текста, Ответы на вопросы

// задачи

Генерация текстаОтветы на вопросы

// описание

Мощная языковая модель от Anthropic, исследующая способность ИИ к «моральному самоисправлению» через обучение с подкреплением (RLHF). Нейросеть эффективно учится избегать вредоносных ответов, следуя этическим установкам в процессе генерации текста. Это важный шаг в создании безопасного и контролируемого искусственного интеллекта.

// abstract

We test the hypothesis that language models trained with reinforcement learning from human feedback (RLHF) have the capability to "morally self-correct" -- to avoid producing harmful outputs -- if instructed to do so. We find strong evidence in support of this hypothesis across three different experiments, each of which reveal different facets of moral self-correction. We find that the capability for moral self-correction emerges at 22B model parameters, and typically improves with increasing model size and RLHF training. We believe that at this level of scale, language models obtain two capabilities that they can use for moral self-correction: (1) they can follow instructions and (2) they can learn complex normative concepts of harm like stereotyping, bias, and discrimination. As such, they can follow instructions to avoid certain kinds of morally harmful outputs. We believe our results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles.

// faq

Что такое Anthropic LM 175B?+

Кто разработал Anthropic LM 175B?+

Какие задачи решает Anthropic LM 175B?+

// похожие модели