- By - Gaurav Masand
- Posted on
- Posted in AI, Research, technology
ESMFold from Meta is ChatGPT of Proteins
Proteins are a fundamental component of living organisms and play a crucial role in many biological processes. Determining the three-dimensional structure of proteins is crucial to understanding their function, interactions with other molecules, and potential therapeutic applications. However, traditional methods for protein structure prediction can be time-consuming and expensive. The prediction of protein structure is a challenging task that requires the integration of different types of data, including sequence, structure, and evolutionary information. Machine learning algorithms have been developed to analyze this data and generate accurate predictions. One such algorithm is the neural network-based AlphaFold, developed by researchers at DeepMind, which has shown remarkable success in predicting the structure of proteins (Improved protein structure prediction using potentials from deep learning | Nature).
Another method for predicting protein structure is Rosetta, which uses a combination of computational methods and experimental data to generate structure predictions (Blocking FSH induces thermogenic adipose tissue and reduces body fat | Nature). This method has been used to predict the structure of many proteins, including the spike protein of the SARS-CoV-2 virus.
In recent years, there have been significant developments in using artificial intelligence (AI) to predict protein structure, and one of the most notable AI models for protein structure prediction is ESMFold from Meta.
ESMFold is an AI model for protein structure prediction that uses the representations from a large language model, ESM2, to generate an accurate structure prediction from the sequence of a protein. According to the creators of ESMFold, to make structure predictions at this scale, a breakthrough in the speed of protein folding was necessary (ESM Metagenomic Atlas by Meta AI (esmatlas.com)). ESMFold’s training data includes over 170,000 protein structures from the Protein Data Bank and the sequences of the proteins that they correspond to.
One of the key features of ESMFold is its speed. According to a study published in the journal Nature, ESMFold is about 60 times faster at predicting protein structures for short sequences than its competitor AlphaFold (AlphaFold’s new rival? Meta AI predicts shape of 600 million proteins (nature.com)). The model’s ability to make predictions quickly and accurately is crucial in the field of protein structure prediction, where time and resources are often limited.
ESMFold has been trained on 15 billion parameters, making it one of the largest language models evaluated to date. The model was evaluated on CAMEO and CASP14 test datasets and compared to both AlphaFold2 and another model, RoseTTAFold. ESMFold’s template modeling score (TM-score) was 83 on the CAMEO test dataset and 87 on the CASP14 test dataset. The TM-score is a measure of the similarity between the predicted and actual protein structures, with a higher score indicating a more accurate prediction (Meta’s Genomics AI ESMFold Predicts Protein Structure 6x Faster Than AlphaFold2 (infoq.com)).
ESMFold’s application is not limited to predicting the structure of individual proteins. In November 2022, Meta released a database called the Metagenomics Protein Structure Database, which contains the structures of more than 600 million putative proteins. The database was created using ESMFold to predict the structures of proteins found in microbes in the soil, deep in the ocean, and even inside our bodies (Meta AI releases models of over 600 million potential proteins (acs.org)).
The ESMFold model is available for download on the Meta Fundamental AI Research Protein Team’s Github page. The repository contains pre-trained weights for ESMFold, ESM2, MSA Transformer, ESM-1v, and ESM-IF1 (GitHub – facebookresearch/esm: Evolutionary Scale Modeling (esm): Pretrained language models for proteins). Additionally, the Hugging Face website provides code and pre-trained weights for Transformer protein language models from Meta AI’s Fundamental AI Research Team, including ESMFold and ESM2 (ESM (huggingface.co)).
In conclusion, ESMFold from Meta is a ground breaking AI model for protein structure prediction that uses the representations from a large language model to generate accurate predictions from protein sequences. The model’s speed and accuracy make it a valuable tool in the field of protein structure prediction and have led to the creation of a vast database of putative protein structures. The availability of ESMFold’s pre-trained weights and code allows researchers worldwide to use the model and further advance our understanding of protein structures and functions.