Predicting the 3D structure of a protein from its amino acid sequence is no easy task due to the large number of possible structures. It is considered an important challenge in computational biology and chemistry. Large tech corporations have started to tackle this problem using artificial intelligence. Both Alphabet (Google’s parent company) and Meta (formerly Facebook) have thrown their hats into the ring: Google’s subsidiary DeepMind has developed the AI program AlphaFold, and Meta’s artificial intelligence laboratory Meta AI has developed ESMFold, both aiming to predict protein structures.
Alexander Rives, Meta AI, New York, USA, and colleagues have used AI to predict the structures of more than 617 million proteins. These proteins stem from a database of metagenomic DNA. Metagenomics is the direct analysis of genomes contained within an environmental sample, which can contain many microbial species. Meta AI calls these proteins “the dark matter of the protein universe” because they are not well understood. The structures have been released as the ESM Metagenomic Atlas.
The team used a large language model to quickly predict the structure from an amino acid sequence. As in languages, there are patterns in the structures of proteins that an AI model can “learn” and use to make predictions. The researchers trained their models using up to 15 billion parameters. The resulting method is up to 60 times faster than state-of-the-art prediction methods, which allowed the team to tackle the large number of metagenomic proteins. Of the 617 million structures, ca. 225 million are considered “high confidence predictions”, i.e., probably modeled well. According to the researchers, the work could help to provide insights into the diversity of natural proteins and to accelerate the discovery of new protein structures and functions.
- Evolutionary-scale prediction of atomic level protein structure with a language model,
Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Salvatore Candido, Alexander Rives,
bioRxiv 2022.
https://doi.org/10.1101/2022.07.20.500902
The research has been published as a preprint and has not yet been peer-reviewed. - ESM Metagenomic Atlas