Extreme Gradient Boosting Tuned with Metaheuristic Algorithms for Predicting Myeloid NGS Onco-Somatic Variant Pathogenicity

Author:

Pellegrino Eric1ORCID,Camilla Clara12,Abbou Norman3,Beaufils Nathalie1,Pissier Christel1,Gabert Jean3,Nanni-Metellus Isabelle1,Ouafik L’Houcine12

Affiliation:

1. APHM, CHU Nord, Service d’OncoBiologie, Aix Marseille University, 13015 Marseille, France

2. CNRS, INP, Inst Neurophysiopathol, Aix Marseille University, 13005 Marseille, France

3. APHM, CHU Nord, Service de Biochimie et de Biologie Moleculaire, Aix Marseille University, 13015 Marseille, France

Abstract

The advent of next-generation sequencing (NGS) technologies has revolutionized the field of bioinformatics and genomics, particularly in the area of onco-somatic genetics. NGS has provided a wealth of information about the genetic changes that underlie cancer and has considerably improved our ability to diagnose and treat cancer. However, the large amount of data generated by NGS makes it difficult to interpret the variants. To address this, machine learning algorithms such as Extreme Gradient Boosting (XGBoost) have become increasingly important tools in the analysis of NGS data. In this paper, we present a machine learning tool that uses XGBoost to predict the pathogenicity of a mutation in the myeloid panel. We optimized the performance of XGBoost using metaheuristic algorithms and compared our predictions with the decisions of biologists and other prediction tools. The myeloid panel is a critical component in the diagnosis and treatment of myeloid neoplasms, and the sequencing of this panel allows for the identification of specific genetic mutations, enabling more accurate diagnoses and tailored treatment plans. We used datasets collected from our myeloid panel NGS analysis to train the XGBoost algorithm. It represents a data collection of 15,977 mutations variants composed of a collection of 13,221 Single Nucleotide Variants (SNVs), 73 Multiple Nucleoid Variants (MNVs), and 2683 insertion deletions (INDELs). The optimal XGBoost hyperparameters were found with Differential Evolution (DE), with an accuracy of 99.35%, precision of 98.70%, specificity of 98.71%, and sensitivity of 1.

Publisher

MDPI AG

Subject

Bioengineering

Reference48 articles.

1. Machine learning random forest for predicting oncosomatic variant NGS analysis;Pellegrino;Sci. Rep.,2021

2. Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., Mitchell, R., Cano, I., and Zhou, T. (2023, May 15). Xgboost: Extreme Gradient Boosting. Package Version-0.4-1.4. Available online: https://xgboost.ai/.

3. Prognostic relevance of integrated genetic profiling in acute myeloid leukemia;Patel;N. Engl. J. Med.,2012

4. Genomic classification and prognosis in acute myeloid leukemia;Papaemmanuil;N. Engl. J. Med.,2016

5. IDH1 and IDH2 gene mutations identify novel molecular subsets within de novo cytogenetically normal acute myeloid leukemia: A Cancer and Leukemia Group B study;Marcucci;J. Clin. Oncol.,2010

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3