Odyssey 102B — настоящий гигант в мире биоинформатики, созданный для проектирования белков и редактирования генетических последовательностей. Мультимодальный ИИ оперирует 102 миллиардами параметров, объединяя структурные данные и биологическую семантику для создания лекарств будущего.
We present Odyssey, a family of multimodal protein language models for sequence and structure generation, protein editing and design. We scale Odyssey to more than 102 billion parameters, trained over 1.1 × 1023 FLOPs. The Odyssey architecture uses context modalities, categorized as structural cues, semantic descriptions, and orthologous group metadata, and comprises two main components: a finite scalar quantizer for tokenizing continuous atomic coordinates, and a transformer stack for multimodal representation learning. Odyssey is trained via discrete diffusion, and characterizes the generative process as a time-dependent unmasking procedure. The finite scalar quantizer and transformer stack leverage the consensus mechanism, a replacement for attention that uses an iterative propagation scheme informed by local agreements between residues. Across various benchmarks, Odyssey achieves landmark performance for protein generation and protein structure discretization. Our empirical findings are supported by theoretical analysis.