Abstract
AbstractMotivationAs only 35% of human proteins feature (often partial) PDB structures, the protein structure prediction tool AlphaFold2 (AF2) could have massive impact on human biology and medicine fields, making independent benchmarks of interest. We studied AF2’s ability to describe the backbone solvent exposure as an easily interpretable “natural coordinate” of protein conformation, using human proteins as test case.ResultsAfter screening for appropriate comparative sets, we matched 1818 human proteins predicted by AF2 against 7585 unique experimental PDBs, and after curation for sequence overlap, we assessed 1264 comparative pairs comprising 115 unique AF2-structures and 652 unique experimental structures. AF2 performed markedly worse for multimers, whereas ligands, cofactors and experimental resolution were interestingly not very important for performance. AF2 performed excellently for monomer proteins. Challenges relating to specific groups of residues and multimers were analyzed. We identify larger errors for lower-confidence scores (pLDDT) and exposed residues, and polar residues (Asp, Glu, Asn e.g.) being less accurately described than hydrophobic residues. Proline conformations were the hardest to predict, probably due to common location in dynamic solvent-accessible parts. In summary, using solvent exposure as a natural metric of local conformation, we quantify the performance of AF2 for human proteins and provide estimates of the expected error as a function of ligand presence, multimer/monomer status, resolution, local residue solvent exposure, pLDDT, and amino acid type. Overall performance was found to be excellent.Availability and ImplementationScripts used to perform benchmarking are available at https://github.com/ktbaek/AlphaFold.
Publisher
Cold Spring Harbor Laboratory