Extreme Gradient Boosting Tuned with Metaheuristic Algorithms for Predicting Myeloid NGS Onco-Somatic Variant Pathogenicity
-
Published:2023-06-23
Issue:7
Volume:10
Page:753
-
ISSN:2306-5354
-
Container-title:Bioengineering
-
language:en
-
Short-container-title:Bioengineering
Author:
Pellegrino Eric1ORCID, Camilla Clara12, Abbou Norman3, Beaufils Nathalie1, Pissier Christel1, Gabert Jean3, Nanni-Metellus Isabelle1, Ouafik L’Houcine12
Affiliation:
1. APHM, CHU Nord, Service d’OncoBiologie, Aix Marseille University, 13015 Marseille, France 2. CNRS, INP, Inst Neurophysiopathol, Aix Marseille University, 13005 Marseille, France 3. APHM, CHU Nord, Service de Biochimie et de Biologie Moleculaire, Aix Marseille University, 13015 Marseille, France
Abstract
The advent of next-generation sequencing (NGS) technologies has revolutionized the field of bioinformatics and genomics, particularly in the area of onco-somatic genetics. NGS has provided a wealth of information about the genetic changes that underlie cancer and has considerably improved our ability to diagnose and treat cancer. However, the large amount of data generated by NGS makes it difficult to interpret the variants. To address this, machine learning algorithms such as Extreme Gradient Boosting (XGBoost) have become increasingly important tools in the analysis of NGS data. In this paper, we present a machine learning tool that uses XGBoost to predict the pathogenicity of a mutation in the myeloid panel. We optimized the performance of XGBoost using metaheuristic algorithms and compared our predictions with the decisions of biologists and other prediction tools. The myeloid panel is a critical component in the diagnosis and treatment of myeloid neoplasms, and the sequencing of this panel allows for the identification of specific genetic mutations, enabling more accurate diagnoses and tailored treatment plans. We used datasets collected from our myeloid panel NGS analysis to train the XGBoost algorithm. It represents a data collection of 15,977 mutations variants composed of a collection of 13,221 Single Nucleotide Variants (SNVs), 73 Multiple Nucleoid Variants (MNVs), and 2683 insertion deletions (INDELs). The optimal XGBoost hyperparameters were found with Differential Evolution (DE), with an accuracy of 99.35%, precision of 98.70%, specificity of 98.71%, and sensitivity of 1.
Reference48 articles.
1. Machine learning random forest for predicting oncosomatic variant NGS analysis;Pellegrino;Sci. Rep.,2021 2. Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., Mitchell, R., Cano, I., and Zhou, T. (2023, May 15). Xgboost: Extreme Gradient Boosting. Package Version-0.4-1.4. Available online: https://xgboost.ai/. 3. Prognostic relevance of integrated genetic profiling in acute myeloid leukemia;Patel;N. Engl. J. Med.,2012 4. Genomic classification and prognosis in acute myeloid leukemia;Papaemmanuil;N. Engl. J. Med.,2016 5. IDH1 and IDH2 gene mutations identify novel molecular subsets within de novo cytogenetically normal acute myeloid leukemia: A Cancer and Leukemia Group B study;Marcucci;J. Clin. Oncol.,2010
|
|