Abstract
ABSTRACTDespite the identification of hundreds of risk genes for primary open-angle glaucoma (POAG), a significant portion of the POAG genetic risk landscape remains unexplored. We hypothesized that unsupervised learning on large protein-protein interaction (PPI) networks could enable comprehensive characterization of the genetic pathways that underlie POAG risk. We used graph representation learning on a proteome-scale PPI network to generate embeddings capturing complex features of each protein’s interactions. Using these embeddings, we trained a model with POAG-associated genes from the DisGeNET database to output an inferred POAG risk score for over 12,000 gene products, which identified known POAG risk genes with an area under the receiver operating characteristic curve of 0.739 (95% CI 0.686-0.792). These included well-known POAG risk genes such asRHOAandMMP3, as well as genes with significant contributions to other ocular diseases. Pathway analysis on the proteome-wide risk scores implicated 20 biological processes in POAG pathogenesis. Furthermore, cluster analysis of embeddings for POAG risk genes revealed 5 distinct functional neighborhoods, including cytokine signaling, coagulation response, collagen biosynthesis, extracellular matrix development, and fatty acid metabolism. Our results suggest that representation learning can recognize important patterns of protein interaction that allowin silicoprioritization of POAG risk genes and pathways.
Publisher
Cold Spring Harbor Laboratory