Author:
Halder Anup Kumar,Bandyopadhyay Soumyendu Sekhar,Jedrzejewski Witold,Basu Subhadip,Sroka Jacek
Abstract
AbstractLarge scale protein-protein interaction (PPI) network of an organism provides key insights into its cellular and molecular functionalities, signaling pathways and underlying disease mechanisms. For any organism the total number of unexplored protein interactions significantly outnumbers all known positive and negative interactions. For Human, all known PPI datasets, contain only ∼ 5.61 million positive and ∼ 0.76 million negative interactions, that together is ∼ 3.1% of potential interactions. Moreover, conventional PPI prediction methods produce binary results. At the same time, recent studies show that protein binding affinities may prove to be effective in detecting protein complexes, disease association analysis, signaling network reconstruction, etc. Keeping these in mind, we present a fuzzy semantic scoring function using the Gene Ontology (GO) graphs to assess the binding affinity between any two proteins at an organism level. We have implemented a distributed algorithm in Apache Spark that computes this function and used it to process a Human PPI network of ∼ 180 million potential interactions resulting from 18 994 reviewed proteins for which GO annotations are available. The quality of the computed scores has been validated with respect to the availablestate-of-the-artmethods on benchmark data sets. The resulting scores are published with a web-server for non-commercial use at:http://fuzzyppi.mimuw.edu.pl/.
Publisher
Cold Spring Harbor Laboratory