MoLFormer-XL — это «швейцарский нож» от IBM в мире молекулярной биологии, использующий архитектуру трансформеров для предсказания свойств химических соединений. Этот ИИ помогает ученым прогнозировать фолдинг белков и проектировать совершенно новые молекулы для медицины.
Large pretrained models are fast becoming AI’s Swiss Army knife. Once limited to summarizing text and translating languages, they can now write code, compose music, and answer obscure questions at length. Now there’s a new skill to add to their repertoire: the ability to infer the shapes and properties of molecules to predict how they might behave and to propose entirely new ones. Most molecular models need estimates or measurements of a molecule’s 3D shape to accurately predict many of its properties. Chemists can extract this information through simulations or lab experiments, but it’s an imperfect, expensive process that can take months to years. Perhaps unsurprisingly, we have detailed structures for only a few million molecules out of the trillions upon trillions potentially out there. But now, there could be a way to eliminate this bottleneck in the discovery process with the help of AI. Introducing MoLFormer-XL, the latest addition to the MoLFormer family of foundation models for molecular discovery. MoLFormer-XL has been pretrained on 1.1 billion molecules represented as machine-readable strings of text. From these simple and accessible chemical representations, it turns out that a transformer can extract enough information to infer a molecule’s form and function.