Tech giants Alpha and Meta (Google and Facebook) applied their Artificial Intelligence (AI) to fold proteins computationally, predicting 3-dimensional shapes from the 1-dimensional sequence data. Meta’s paper is still paywalled (preprint) but AlphaFold’s Nature papers from last year are available (Jumper, Tunyasuvunakool).
The AlphaFold authors noted that the ~100,000 protein structures determined by conventional experimental means are a small portion of the “billions” extant in nature. Previous approaches “focus on either the physical interactions or the evolutionary history”, which they say relies on the availability of close homologues or works for (only) a few, small proteins and is otherwise “computationally intractable” (too hard). They evaluated in the 87 protein domains comprising the 14th Critical Assessment of (protein) Structure Prediction (CASP14) dataset, structures not yet deposited in the public Protein Data Bank (PDB). This permits a ‘blind’ (apriori) comparison of AI methods, by comparing their predictions with the newly-solved structures.
Fig 1. a. Scores. b. Backbone. c. Side chains |
While impressive, this is a very constrained set of structures, nothing justifying the claims made in the popular press of solving all proteins. To be comprehensive, it seems that AI will have to consider biology, implement means of including the amino-terminal-first synthesis, nucleation, domain folding, insertion into a membrane, and above all interaction with chaperone proteins.
Jumper J, Evans R, Pritzel A, Green T, Figurnov M, Ronneberger O, Tunyasuvunakool K, Bates R, Žídek A, Potapenko A, Bridgland A, Meyer C, Kohl SAA, Ballard AJ, Cowie A, Romera-Paredes B, Nikolov S, Jain R, Adler J, Back T, Petersen S, Reiman D, Clancy E, Zielinski M, Steinegger M, Pacholska M, Berghammer T, Bodenstein S, Silver D, Vinyals O, Senior AW, Kavukcuoglu K, Kohli P, Hassabis D. Highly accurate protein structure prediction with AlphaFold. Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15. PMID: 34265844; PMCID: PMC8371605.