AVADA Enables Automated Genetic Variant Curation Directly from the Full Text Literature-Reference-Cited by-同舟云学术

AVADA Enables Automated Genetic Variant Curation Directly from the Full Text Literature

Published:2018-11-04 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Birgmeier Johannes,Tierno Andrew P.,Stenson Peter D.,Deisseroth Cole A.,Jagadeesh Karthik A.,Cooper David N.,Bernstein Jonathan A.,Haeussler Maximilian,Bejerano Gill^ORCID

Abstract

AbstractPurposeThe primary literature on human genetic diseases includes descriptions of pathogenic variants that are essential for clinical diagnosis. Variant databases such as ClinVar and HGMD collect pathogenic variants by manual curation. We aimed to automatically construct a freely accessible database of pathogenic variants directly from full-text articles about genetic disease.MethodsAVADA (Automatically curated VAriant DAtabase) is a novel machine learning tool that uses natural language processing to automatically identify pathogenic variants and genes in full text of primary literature and converts them to genomic coordinates for rapid downstream use.ResultsAVADA automatically curated almost 60% of pathogenic variants deposited in HGMD, a 4.4-fold improvement over the current state of the art in automated variant extraction. AVADA also contains more than 60,000 pathogenic variants that are in HGMD, but not in ClinVar. In a cohort of 245 diagnosed patients, AVADA correctly annotated 38 previously described diagnostic variants, compared to 43 using HGMD, 20 using ClinVar and only 13 (wholly subsumed by AVADA and ClinVar’s) using the best automated abstracts-only based approach.ConclusionAVADA is the first machine learning tool that automatically curates a variants database directly from full text literature. AVADA is available upon publication at http://bejerano.stanford.edu/AVADA.

Publisher

Cold Spring Harbor Laboratory

Reference58 articles.

1. Compelling Reasons for Repairing Human Germlines

2. Targeted capture and massively parallel sequencing of 12 human exomes

3. Mutations in NOTCH2 cause Hajdu-Cheney syndrome, a disorder of severe and progressive bone loss

4. Exome sequencing identifies the cause of a mendelian disorder

5. M-CAP eliminates a majority of variants of uncertain significance in clinical exomes at high sensitivity

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Gestational Diabetes and its Therapeutic Nutritional Care;Pakistan BioMedical Journal;2022-05-31

2. AMELIE 3: Fully Automated Mendelian Patient Reanalysis at Under 1 Alert per Patient per Year;2021-01-04