Defining the extent of gene function using ROC curvature

Author:

Fischer Stephan12ORCID,Gillis Jesse13ORCID

Affiliation:

1. Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics , Cold Spring Harbor, NY 11724, USA

2. Institut Pasteur, Université Paris Cité, Bioinformatics and Biostatistics Hub , Paris F-75015, France

3. Department of Physiology, University of Toronto , Toronto, ON, Canada

Abstract

AbstractMotivationInteractions between proteins help us understand how genes are functionally related and how they contribute to phenotypes. Experiments provide imperfect ‘ground truth’ information about a small subset of potential interactions in a specific biological context, which can then be extended to the whole genome across different contexts, such as conditions, tissues or species, through machine learning methods. However, evaluating the performance of these methods remains a critical challenge. Here, we propose to evaluate the generalizability of gene characterizations through the shape of performance curves.ResultsWe identify Functional Equivalence Classes (FECs), subsets of annotated and unannotated genes that jointly drive performance, by assessing the presence of straight lines in ROC curves built from gene-centric prediction tasks, such as function or interaction predictions. FECs are widespread across data types and methods, they can be used to evaluate the extent and context-specificity of functional annotations in a data-driven manner. For example, FECs suggest that B cell markers can be decomposed into shared primary markers (10–50 genes), and tissue-specific secondary markers (100–500 genes). In addition, FECs suggest the existence of functional modules that span a wide range of the genome, with marker sets spanning at most 5% of the genome and data-driven extensions of Gene Ontology sets spanning up to 40% of the genome. Simple to assess visually and statistically, the identification of FECs in performance curves paves the way for novel functional characterization and increased robustness in the definition of functional gene sets.Availability and implementationCode for analyses and figures is available at https://github.com/yexilein/pyroc.Supplementary informationSupplementary data are available at Bioinformatics online.

Funder

National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference54 articles.

1. The OMA orthology database in 2015: function predictions, better plant support, synteny view and other improvements;Altenhoff;Nucleic Acids Res,2015

2. Graphical assessment of tests and classifiers;Altman;Nat. Methods,2021

3. Gene ontology: tool for the unification of biology;Ashburner;Nat. Genet,2000

4. Comparative cellular analysis of motor cortex in human, marmoset and mouse;Bakken,2021

5. EGAD: ultra-fast functional analysis of gene networks;Ballouz;Bioinformatics,2017

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3