CNVoyant: A Highly Performant and Explainable Multi-Classifier Machine Learning Approach for Determining the Clinical Significance of Copy Number Variants-Reference-Cited by-同舟云学术

CNVoyant: A Highly Performant and Explainable Multi-Classifier Machine Learning Approach for Determining the Clinical Significance of Copy Number Variants

Published:2024-04-30 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Schuetz Robert J.¹,Ceyhan Defne¹,Antoniou Austin A.¹,Chaudhari Bimal P.¹,White Peter¹

Affiliation:

1. The Abigail Wexner Research Institute at Nationwide Children’s Hospital

Abstract

The precise classification of copy number variants (CNVs) presents a significant challenge in genomic medicine, primarily due to the complex nature of CNVs and their diverse impact on genetic disorders. This complexity is compounded by the limitations of existing methods in accurately distinguishing between benign, uncertain, and pathogenic CNVs. Addressing this gap, we introduce CNVoyant, a machine learning-based multi-class framework designed to enhance the clinical significance classification of CNVs. Trained on a comprehensive dataset of 52,176 ClinVar entries across pathogenic, uncertain, and benign classifications, CNVoyant incorporates a broad spectrum of genomic features, including genome position, disease-gene annotations, dosage sensitivity, and conservation scores. Models to predict the clinical significance of copy number gains and losses were trained independently. Final models were selected after testing 29 machine learning architectures and 10,000 hyperparameter combinations each for deletions and duplications via 5-fold cross-validation. We validate the performance of the CNVoyant by leveraging a comprehensive set of 21,574 CNVs from the DECIPHER database, a highly regarded resource known for its extensive catalog of chromosomal imbalances linked to clinical outcomes. Compared to alternative approaches, CNVoyant shows marked improvements in precision-recall and ROC AUC metrics for binary pathogenic classifications while going one step further, offering multi-classification of clinical significance and corresponding SHAP explainability plots. This large-scale validation demonstrates CNVoyant’s superior accuracy and underscores its potential to aid genomic researchers and clinical geneticists in interpreting the clinical implications of real CNVs.

Publisher

Research Square Platform LLC

Reference44 articles.

1. A global reference for human genetic variation;Auton A;Nature,2015

2. Mapping copy number variation by population-scale genome sequencing;Genomes Project, Mills RE;Nature,2011

3. OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders;Amberger JS;Nucleic Acids Res,2015

4. Comprehensive use of extended exome analysis improves diagnostic yield in rare disease: a retrospective survey in 1,059 cases;Bergant G;Genet Med,2018

5. Meta-analysis of the diagnostic and clinical utility of genome and exome sequencing and chromosomal microarray in children with suspected genetic diseases;Clark MM;Npj Genomic Med,2018