Abstract
AbstractExtensive evidence recognizes that proteins associated with several diseases frequently interact with each other. This leads to develop different network-based methods for uncovering the molecular workings of human diseases. These methods are based on the idea that protein interaction networks act as maps, where diseases manifest as localized perturbations within a neighborhood. Identifying these areas, known as disease modules, is essential for in-depth research into specific disease characteristics. While many computational methods have been developed the underlying connectivity patterns within these modules still yet to be explored. This work aim to fill this gap by integrating multiple biological data sources through non-negative matrix factorization (NMF) technique. We leverage two biological sources of information, protein-protein interactions (PPIs) and Gene Ontology data to find connections between novel genes and diseases. The data sources are first converted into networks, which are then clustered to obtain modules. Two types of modules are then integrated through NMF-based technique to obtain a set of meta-modules which preserve the essential characteristics of interaction patterns and functional similarity information among the proteins/genes. We assign multiple labels to each meta-module based on the statistical and biological properties they shared with the disease dataset. A multi-label classification technique is utilized to assign new disease labels to genes within each meta-modules. A total of 3131 gene-disease associations are identified, which are also validated through a literature survey, gene ontology and pathway-based analysis.
Publisher
Cold Spring Harbor Laboratory