Connect with us

Technology

'ChatGPT moment for biology': Ex-Meta scientists develop AI model that creates proteins 'not found in nature'

Published

on

/ 3188 Views

Just as ChatGPT generates text by predicting the word most likely to follow in a sequence, a new artificial intelligence (AI) model can write new proteins that are not naturally ocurring from scratch.

Scientists used the new model, ESM3, to create a new fluorescent protein that shares only 58% of its sequence with naturally occurring fluorescent proteins, they said in a study published July 2 on the preprint bioRxiv database. Representatives from EvolutionaryScale, a company formed by former Meta researchers, also outlined details June 25 in a statement.

The research team has released a small version of the model under a non-commercial license and will make the large version of the model available to commercial researchers. According to EvolutionaryScale, the technology could be useful in fields ranging from drug discovery to designing new chemicals for plastic degradation.

ESM3 is a large language model (LLM) similar to OpenAI's GPT-4, which powers the ChatGPT chatbot, and the scientists trained their largest version on 2.78 billion proteins. For each protein, they extracted information about sequence (the order of the amino acid building blocks that make up the protein), structure (the three-dimensional folded shape of the protein), and function (what the protein does). They randomly masked pieces of information about these proteins and requested that ESM3 predict the missing pieces.

They scaled this model up from research that the same team was conducting while still at Meta. In 2022 they announced EMSFold — a precursor to ESM3 that predicted unknown microbial protein structures. That year, Alphabet's DeepMind also predicted protein structures for 200 million proteins.

Related: DeepMind's AI program AlphaFold3 can predict the structure of every protein in the universe — and show how they function

Scientists subsequently pointed out that there are limitations to these AI models' predictions and that the protein predictions need to be verified. But the methods can still massively speed up the search for protein structures, because the alternative is to use X-rays to map out protein structures one by one — which is slow and costly.

Trending