BioM2: biologically informed multi-stage machine learning for phenotype prediction using omics data

Author:

Zhang Shunjie1,Li Pan23,Wang Shenghan23,Zhu Jijun23,Huang Zhongting23,Cai Fuqiang1,Freidel Sebastian4567,Ling Fei1,Schwarz Emanuel4567ORCID,Chen Junfang238ORCID

Affiliation:

1. School of Biology and Biological Engineering, South China University of Technology , Guangzhou , China

2. Center for Intelligent Medicine , Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences, , No. 6, 2nd Nanjiang Road, Nansha District, 511462 Guangzhou , China

3. Fudan University , Greater Bay Area Institute of Precision Medicine (Guangzhou), School of Life Sciences, , No. 6, 2nd Nanjiang Road, Nansha District, 511462 Guangzhou , China

4. Hector Institute for Artificial Intelligence in Psychiatry , Central Institute of Mental Health, Medical Faculty Mannheim, , M7, Mannheim 68161 , Germany

5. Heidelberg University , Central Institute of Mental Health, Medical Faculty Mannheim, , M7, Mannheim 68161 , Germany

6. Department of Psychiatry and Psychotherapy , Central Institute of Mental Health, Medical Faculty Mannheim, , J5, Mannheim 68159 , Germany

7. Heidelberg University , Central Institute of Mental Health, Medical Faculty Mannheim, , J5, Mannheim 68159 , Germany

8. Center for Evolutionary Biology , School of Life Sciences, Fudan University, Shanghai , China

Abstract

Abstract Navigating the complex landscape of high-dimensional omics data with machine learning models presents a significant challenge. The integration of biological domain knowledge into these models has shown promise in creating more meaningful stratifications of predictor variables, leading to algorithms that are both more accurate and generalizable. However, the wider availability of machine learning tools capable of incorporating such biological knowledge remains limited. Addressing this gap, we introduce BioM2, a novel R package designed for biologically informed multistage machine learning. BioM2 uniquely leverages biological information to effectively stratify and aggregate high-dimensional biological data in the context of machine learning. Demonstrating its utility with genome-wide DNA methylation and transcriptome-wide gene expression data, BioM2 has shown to enhance predictive performance, surpassing traditional machine learning models that operate without the integration of biological knowledge. A key feature of BioM2 is its ability to rank predictor variables within biological categories, specifically Gene Ontology pathways. This functionality not only aids in the interpretability of the results but also enables a subsequent modular network analysis of these variables, shedding light on the intricate systems-level biology underpinning the predictive outcome. We have proposed a biologically informed multistage machine learning framework termed BioM2 for phenotype prediction based on omics data. BioM2 has been incorporated into the BioM2 CRAN package (https://cran.r-project.org/web/packages/BioM2/index.html).

Funder

Greater Bay Area Institute of Precision Medicine

National Social Science Foundation

Natural Science Foundation of Guangdong Province

Shanghai Key Laboratory of Psychotic Disorders

Hector II Foundation

German Federal Ministry of Education and Research

German Center for Mental Health

Buchholz-Fachinformationsdienst GmbH

Lundbeck Foundation

Publisher

Oxford University Press (OUP)

Reference60 articles.

1. Technological and computational advances driving high-throughput oncology;Kolmar;Trends Cell Biol,2022

2. Quantitative analysis of high-throughput biological data;Juan;WIREs Computat Mol Sci,2023

3. High-throughput single-сell sequencing in cancer research;Jia;Signal Transduct Target Ther,2022

4. Machine learning for multi-omics data integration in cancer;Cai;iScience,2022

5. A guide to machine learning for biologists;Greener;Nat Rev Mol Cell Biol,2022

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3