The probability of edge existence due to node degree: a baseline for network-based predictions

Author:

Zietz Michael123ORCID,Himmelstein Daniel S14ORCID,Kloster Kyle56ORCID,Williams Christopher1ORCID,Nagle Michael W789ORCID,Greene Casey S11011ORCID

Affiliation:

1. Department of Systems Pharmacology and Translational Therapeutics, University of Pennsylvania , Philadelphia, PA 19104 , USA

2. Department of Physics & Astronomy, University of Pennsylvania , Philadelphia, PA 19104 , USA

3. Department of Biomedical Informatics, Columbia University , New York, NY 10032 , USA

4. Related Sciences , Denver, CO 80202 , USA

5. Carbon, Inc. , Redwood City, CA 94063 , USA

6. Department of Computer Science, North Carolina State University , Raleigh, NC 27606 , USA

7. Internal Medicine Research Unit, Pfizer Worldwide Research, Development, and Medical , Cambridge, MA 02139 , USA

8. Integrative Biology, Internal Medicine Research Unit, Worldwide Research, Development, and Medicine, Pfizer Inc. , Cambridge, MA 02139 , USA

9. Human Biology Integration Foundation, Deep Human Biology Learning, Eisai Inc. , Cambridge, MA 02140 , USA

10. Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine , Aurora, CO 80045 , USA

11. Center for Health AI, University of Colorado School of Medicine , Aurora, CO 80045 , USA

Abstract

Abstract Important tasks in biomedical discovery such as predicting gene functions, gene–disease associations, and drug repurposing opportunities are often framed as network edge prediction. The number of edges connecting to a node, termed degree, can vary greatly across nodes in real biomedical networks, and the distribution of degrees varies between networks. If degree strongly influences edge prediction, then imbalance or bias in the distribution of degrees could lead to nonspecific or misleading predictions. We introduce a network permutation framework to quantify the effects of node degree on edge prediction. Our framework decomposes performance into the proportions attributable to degree and the network’s specific connections using network permutation to generate features that depend only on degree. We discover that performance attributable to factors other than degree is often only a small portion of overall performance. Researchers seeking to predict new or missing edges in biological networks should use our permutation approach to obtain a baseline for performance that may be nonspecific because of degree. We released our methods as an open-source Python package (https://github.com/hetio/xswap/).

Funder

Gordon and Betty Moore Foundation

National Institutes of Health

Publisher

Oxford University Press (OUP)

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3