Coracle—a machine learning framework to identify bacteria associated with continuous variables

Author:

Staab Sebastian1ORCID,Cardénas Anny12ORCID,Peixoto Raquel S3ORCID,Schreiber Falk45ORCID,Voolstra Christian R1ORCID

Affiliation:

1. Department of Biology, University of Konstanz , Konstanz 78457, Germany

2. Department of Biology, American University , Washington, DC, 20016, USA

3. Computational Biology Research Center (CBRC) and Red Sea Research Center (RSRC), Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST) , Thuwal 23955, Saudi Arabia

4. Department of Computer and Information Science, University of Konstanz , Konstanz 78457, Germany

5. Faculty of Information Technology, Monash University , 3168, Australia

Abstract

Abstract Summary We present Coracle, an artificial intelligence (AI) framework that can identify associations between bacterial communities and continuous variables. Coracle uses an ensemble approach of prominent feature selection methods and machine learning (ML) models to identify features, i.e. bacteria, associated with a continuous variable, e.g. host thermal tolerance. The results are aggregated into a score that incorporates the performances of the different ML models and the respective feature importance, while also considering the robustness of feature selection. Additionally, regression coefficients provide first insights into the direction of the association. We show the utility of Coracle by analyzing associations between bacterial composition data (i.e. 16S rRNA Amplicon Sequence Variants, ASVs) and coral thermal tolerance (i.e. standardized short-term heat stress-derived diagnostics). This analysis identified high-scoring bacterial taxa that were previously found associated with coral thermal tolerance. Coracle scales with feature number and performs well with hundreds to thousands of features, corresponding to the typical size of current datasets. Coracle performs best if run at a higher taxonomic level first (e.g. order or family) to identify groups of interest that can subsequently be run at the ASV level. Availability and implementation Coracle can be accessed via a dedicated web server that allows free and simple access: http://www.micportal.org/coracle/index. The underlying code is open-source and available via GitHub https://github.com/SebastianStaab/coracle.git.

Funder

King Abdullah University of Science and Technology

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3