Author:
Piran Mehran,Sepahi Neda,Piran Mehrdad,Fernandes Pedro L,Ghanbariasad Ali
Abstract
Motivation: There are important molecular information hidden in the ocean of big data could be achieved by recognizing true relationships between different molecules. Human mind is very limited to find all molecular connections. Therefore, we introduced an integrated data mining strategy to find all possible relationships between molecular components in a biological context. To demonstrate how this approach works, we applied it on proto-oncogene c-Src. Results: Here we applied a data mining scheme on genomic, literature and signaling databases to obtain necessary biological information for pathway inference. Using R programming language, two large edgelists were constructed from KEGG and OmniPath signaling databases. Next, An R script was developed by which pathways were discovered by assembly of edge information in the constructed signaling networks. Then, valid pathways were distinguished from the invalid ones using molecular information in articles and genomic data analysis. Pathway inference was performed on predicted pathways starting with Src and ending with the DEGs whose expression were affected by c-Src overactivation. Moreover, some positive and negative feedback loops were proposed based on the gene expression results. In fact, this simple but practical flowchart will open new insights into interactions between cellular components and help biologists look for new possible molecular relationships that have not been reported neither in signaling databases nor as a signaling pathway.
Publisher
Cold Spring Harbor Laboratory