Odyssey 12B — это мощная мультимодальная нейросеть от Anthrogen, созданная для проектирования и редактирования белков. Модель использует структурные подсказки и метаданные для генерации новых биологических последовательностей, открывая новые горизонты в биоинженерии с помощью ИИ.
We present Odyssey, a family of multimodal protein language models for sequence and structure generation, protein editing and design. We scale Odyssey to more than 102 billion parameters, trained over 1.1 × 1023 FLOPs. The Odyssey architecture uses context modalities, categorized as structural cues, semantic descriptions, and orthologous group metadata, and comprises two main components: a finite scalar quantizer for tokenizing continuous atomic coordinates, and a transformer stack for multimodal representation learning. Odyssey is trained via discrete diffusion, and characterizes the generative process as a time-dependent unmasking procedure. The finite scalar quantizer and transformer stack leverage the consensus mechanism, a replacement for attention that uses an iterative propagation scheme informed by local agreements between residues. Across various benchmarks, Odyssey achieves landmark performance for protein generation and protein structure discretization. Our empirical findings are supported by theoretical analysis.