Abstract
ABSTRACTThe reduced cost of sequencing is leading to an explosive growth in the number of available sequences across diverse genomes, and for individual patients. Inferring meaningful functions of individual genes/proteins is lagging, which hinders the deeper understanding of biological function and evolution. Traditionally, protein function has been determined by time consuming experimental methods or by sequence matching that often does not agree with the experimental findings. We have significantly improved protein sequence matching, by accounting for inter-dependent amino acid substitutions observed within densely packed protein structures, which yields additional substitutions beyond those usually seen, with good matches to additional proteins, some having new functions, not identified by conventional sequence matching. In the current study, we have applied this approach to predict novel functions for the proteins from HIV. These newly found functional annotations are then manually reviewed and many are validated from the literature, here for the HIV envelope protein gp120. These new functions are both more specific as well as some being entirely novel functions. We also show statistically that on average our new functional annotations are more informative than those given by conventional substitution matrices such as BLOSUM62. These results suggest that the new ProtSub protein sequence matching that incorporates structural information generally yields better identifications of related proteins, which can have broader and often gains in identifying more specific functions
Publisher
Cold Spring Harbor Laboratory