Abstract
ABSTRACTFrom transcription to decay, RNA-binding proteins (RBPs) influence RNA metabolism. Using the RBP2GO database that combines proteome-wide RBP screens from 13 species, we investigated the RNA-binding features of 176896 proteins. By compiling published lists of RNA-binding domains (RBDs) and RNA-related protein family (Rfam) IDs with lists from the InterPro database, we analyzed the distribution of the RBDs and Rfam IDs in RBPs and non-RBPs to select RBDs and Rfam IDs that were enriched in RBPs. We also explored proteins for their content of intrinsically disordered region (IDRs) and found a strong positive correlation between IDRs and RBDs. Our bioinformatic analysis indicated that RBDs/Rfam IDs were strong indicators of the RNA-binding potential of proteins and helped predicting new RBP candidates, especially in species with few available proteome-wide RBP identification studies. By further analyzing RBPs with no known RBD, we also predicted 15 new RBDs that were experimentally validated by RNA-bound peptides. Finally, we created the new RBP2GO composite score by combining the RBP2GO score with new quality factors linked to RBDs and Rfam IDs. Based on the RBP2GO composite score, we compiled a list of 2018 high-confidence human RBPs. The knowledge collected here was integrated into the RBP2GO database athttps://RBP2GO.dkfz.de.GRAPHICAL ABSTRACTKey PointsComprehensive analysis of RNA-related protein domains and families enriched in RNA-binding proteins (RBPs)Pan-species prediction of new RBPs, and prediction and validation of new RNA-binding domainsOnline resource with complete dataset including high-confidence human RBPs according to a new scoring system
Publisher
Cold Spring Harbor Laboratory