PCfun: a hybrid computational framework for systematic characterization of protein complex function

Author:

Sharma Varun S12,Fossati Andrea34,Ciuffa Rodolfo1,Buljan Marija56,Williams Evan G7,Chen Zhen8,Shao Wenguang1,Pedrioli Patrick G A1,Purcell Anthony W9,Martínez María Rodríguez10,Song Jiangning911ORCID,Manica Matteo10ORCID,Aebersold Ruedi112,Li Chen19ORCID

Affiliation:

1. Department of Biology, Institute of Molecular Systems Biology , ETH Zurich, Switzerland

2. CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences , Vienna, Austria

3. Quantitative Biosciences Institute (QBI) and Department of Cellular and Molecular Pharmacology, University of California , San Francisco, CA 94158, USA

4. J. David Gladstone Institutes , San Francisco, CA 94158, USA

5. Empa - Swiss Federal Laboratories for Materials Science and Technology , St. Gallen, Switzerland

6. Swiss Institute of Bioinformatics (SIB) , Lausanne, Switzerland

7. Luxembourg Centre for Systems Biomedicine, University of Luxembourg , Esch-sur-Alzette Luxembourg

8. Collaborative Innovation Center of Henan Grain Crops, Henan Agricultural University , Zhengzhou 450046, China

9. Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University , Melbourne, VIC 3800, Australia

10. IBM Research Europe , Zurich, Switzerland

11. Monash Data Futures Institute, Monash University , Melbourne, VIC 3800, Australia

12. Faculty of Science, University of Zurich , Switzerland

Abstract

Abstract In molecular biology, it is a general assumption that the ensemble of expressed molecules, their activities and interactions determine biological function, cellular states and phenotypes. Stable protein complexes—or macromolecular machines—are, in turn, the key functional entities mediating and modulating most biological processes. Although identifying protein complexes and their subunit composition can now be done inexpensively and at scale, determining their function remains challenging and labor intensive. This study describes Protein Complex Function predictor (PCfun), the first computational framework for the systematic annotation of protein complex functions using Gene Ontology (GO) terms. PCfun is built upon a word embedding using natural language processing techniques based on 1 million open access PubMed Central articles. Specifically, PCfun leverages two approaches for accurately identifying protein complex function, including: (i) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector and (ii) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing a hypergeometric statistical test to enrich the top NN GO terms within the child terms of the GO terms predicted by the RF models. The documentation and implementation of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function.

Funder

European Research Council

Swiss National Science Foundation

National Health and Medicine Research Council of Australia

CJ Martin Early Career Research Fellowship

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3