Wednesday, November 16, 2022

Protein folding by AI: wrinkles

Tech giants Alpha and Meta (Google and Facebook) applied their Artificial Intelligence (AI) to fold proteins computationally, predicting 3-dimensional shapes from the 1-dimensional sequence data. Meta’s paper is still paywalled (preprint) but AlphaFold’s Nature papers from last year are available (Jumper, Tunyasuvunakool). 

The AlphaFold authors noted that the ~100,000 protein structures determined by conventional experimental means are a small portion of the “billions” extant in nature. Previous approaches “focus on either the physical interactions or the evolutionary history”, which they say relies on the availability of close homologues or works for (only) a few, small proteins and is otherwise “computationally intractable” (too hard). They evaluated in the 87 protein domains comprising the 14th Critical Assessment of (protein) Structure Prediction (CASP14) dataset, structures not yet deposited in the public Protein Data Bank (PDB). This permits a ‘blind’ (apriori) comparison of AI methods, by comparing their predictions with the newly-solved structures. 

Fig 1. a. Scores. b. Backbone. c. Side chains
 By this measure, AlphaFold is much better than its competitors (Fig 1a, shown, predicted vs experimental). It gains accuracy on backbone and side chains (1b, c) “by incorporating novel neural network architectures and training procedures based on the evolutionary, physical and geometric constraints of protein structures”. Using “multiple sequence alignments (MSAs) and pairwise” comparisons, it “predicts the 3D coordinates of all heavy atoms for a given protein using the primary amino acid sequence and aligned sequences of homologues as inputs”. So give it a bunch of similar sequences and structures and voila! it gives the ‘new’ one. Thy describe the process and you can download to code to inspect, modify, run yourself (open source).  

While impressive, this is a very constrained set of structures, nothing justifying the claims made in the popular press of solving all proteins. To be comprehensive, it seems that AI will have to consider biology, implement means of including the amino-terminal-first synthesis, nucleation, domain folding, insertion into a membrane, and above all interaction with chaperone proteins. 

