Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization-Reference-Cited by-同舟云学术

Diagnosis of a Single-Nucleotide Variant in Whole-Exome Sequencing Data for Patients With Inherited Diseases: Machine Learning Study Using Artificial Intelligence Variant Prioritization

Published:2022-09-15 Issue:1 Volume:3 Page:e37701
ISSN:2563-3570
Container-title:JMIR Bioinformatics and Biotechnology
language:en
Short-container-title:JMIR Bioinform Biotech

Author:

Huang Yu-Shan^ORCID,Hsu Ching^ORCID,Chune Yu-Chang^ORCID,Liao I-Cheng^ORCID,Wang Hsin^ORCID,Lin Yi-Lin^ORCID,Hwu Wuh-Liang^ORCID,Lee Ni-Chung^ORCID,Lai Feipei^ORCID

Abstract

Background In recent years, thanks to the rapid development of next-generation sequencing (NGS) technology, an entire human genome can be sequenced in a short period. As a result, NGS technology is now being widely introduced into clinical diagnosis practice, especially for diagnosis of hereditary disorders. Although the exome data of single-nucleotide variant (SNV) can be generated using these approaches, processing the DNA sequence data of a patient requires multiple tools and complex bioinformatics pipelines. Objective This study aims to assist physicians to automatically interpret the genetic variation information generated by NGS in a short period. To determine the true causal variants of a patient with genetic disease, currently, physicians often need to view numerous features on every variant manually and search for literature in different databases to understand the effect of genetic variation. Methods We constructed a machine learning model for predicting disease-causing variants in exome data. We collected sequencing data from whole-exome sequencing (WES) and gene panel as training set, and then integrated variant annotations from multiple genetic databases for model training. The model built ranked SNVs and output the most possible disease-causing candidates. For model testing, we collected WES data from 108 patients with rare genetic disorders in National Taiwan University Hospital. We applied sequencing data and phenotypic information automatically extracted by a keyword extraction tool from patient’s electronic medical records into our machine learning model. Results We succeeded in locating 92.5% (124/134) of the causative variant in the top 10 ranking list among an average of 741 candidate variants per person after filtering. AI Variant Prioritizer was able to assign the target gene to the top rank for around 61.1% (66/108) of the patients, followed by Variant Prioritizer, which assigned it for 44.4% (48/108) of the patients. The cumulative rank result revealed that our AI Variant Prioritizer has the highest accuracy at ranks 1, 5, 10, and 20. It also shows that AI Variant Prioritizer presents better performance than other tools. After adopting the Human Phenotype Ontology (HPO) terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). Conclusions We successfully applied sequencing data from WES and free-text phenotypic information of patient’s disease automatically extracted by the keyword extraction tool for model training and testing. By interpreting our model, we identified which features of variants are important. Besides, we achieved a satisfactory result on finding the target variant in our testing data set. After adopting the HPO terms by looking up the databases, the top 10 ranking list can be increased to 93.5% (101/108). The performance of the model is similar to that of manual analysis, and it has been used to help National Taiwan University Hospital with a genetic diagnosis.

Publisher

JMIR Publications Inc.

Subject

General Earth and Planetary Sciences,General Environmental Science

Reference68 articles.

1. What is next generation sequencing?

2. Artificial intelligence (AI)-assisted exome reanalysis greatly aids in the identification of new positive cases and reduces analysis time in a clinical diagnostic laboratory

3. DNA sequencing with chain-terminating inhibitors

4. Online Mendelian Inheritance In Man (OMIM)

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Uncovering hidden genetic risk factors for breast and ovarian cancers in BRCA-negative women: a machine learning approach in the Saudi population;PeerJ Computer Science;2024-04-19