Author:
Zhao He,Liu Guixia,Cao Xintian
Abstract
Abstract
Background
The identification of essential proteins is of great significance in biology and pathology. However, protein–protein interaction (PPI) data obtained through high-throughput technology include a high number of false positives. To overcome this limitation, numerous computational algorithms based on biological characteristics and topological features have been proposed to identify essential proteins.
Results
In this paper, we propose a novel method named SESN for identifying essential proteins. It is a seed expansion method based on PPI sub-networks and multiple biological characteristics. Firstly, SESN utilizes gene expression data to construct PPI sub-networks. Secondly, seed expansion is performed simultaneously in each sub-network, and the expansion process is based on the topological features of predicted essential proteins. Thirdly, the error correction mechanism is based on multiple biological characteristics and the entire PPI network. Finally, SESN analyzes the impact of each biological characteristic, including protein complex, gene expression data, GO annotations, and subcellular localization, and adopts the biological data with the best experimental results. The output of SESN is a set of predicted essential proteins.
Conclusions
The analysis of each component of SESN indicates the effectiveness of all components. We conduct comparison experiments using three datasets from two species, and the experimental results demonstrate that SESN achieves superior performance compared to other methods.
Funder
National Natural Science Foundation of China
Science and Technology Development Program of Jilin Province
Key Laboratory for Symbol Computation and Knowledge Engineering of the National Education Ministry of China,Jilin University
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology