ClairS: a deep-learning method for long-read somatic small variant calling-Reference-Cited by-同舟云学术

ClairS: a deep-learning method for long-read somatic small variant calling

Published:2023-08-21 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Zheng Zhenxian^ORCID,Su Junhao^ORCID,Chen Lei^ORCID,Lee Yan-Lam^ORCID,Lam Tak-Wah^ORCID,Luo Ruibang^ORCID

Abstract

AbstractIdentifying somatic variants in tumor samples is a crucial task, which is often performed using statistical methods and heuristic filters applied to short-read data. However, with the increasing demand for long-read somatic variant calling, existing methods have fallen short. To address this gap, we present ClairS, the first deep-learning-based, long-read somatic small variant caller. ClairS was trained on massive synthetic somatic variants with diverse coverages and variant allele frequencies (VAF), enabling it to accurately detect a wide range of somatic variants from paired tumor and normal samples. We evaluated ClairS using the latest Nanopore Q20+ HCC1395-HCC1395BL dataset. With 50-fold/25-fold tumor/normal, ClairS achieved a 93.01%/86.86% precision/recall rate for Single Nucleotide Variation (SNVs), and 66.54%/66.89% for somatic insertions and deletions (Indels). Applying ClairS to short-read datasets from multiple sources showed comparable or better performance than Strelka2 and Mutect2. Our findings suggest that improved read phasing enabled by long-read sequencing is key to accurate long-read SNV calling, especially for variants with low VAF. Through experiments across various coverage, purity, and contamination settings, we demonstrated that ClairS is a reliable somatic variant caller. ClairS is open-source athttps://github.com/HKU-BAL/ClairS.

Publisher

Cold Spring Harbor Laboratory

Reference33 articles.

1. The Cancer Genome Atlas Pan-Cancer analysis project

2. From somatic variants towards precision oncology: evidence-driven reporting of treatment options in molecular tumor boards;Genome medicine,2018

3. Establishing community reference samples, data and call sets for benchmarking cancer mutation detection using whole-genome sequencing

4. Accurate somatic variant detection using weakly supervised deep learning;Nature Communications,2022

5. Deep convolutional neural networks for accurate somatic mutation detection;Nature communications,2019

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. From GPUs to AI and quantum: three waves of acceleration in bioinformatics;Drug Discovery Today;2024-06

2. Nanopore sequencing with unique molecular identifiers enables accurate mutation analysis and haplotyping in the complex Lipoprotein(a) KIV-2 VNTR;2024-03-05

3. Detecting Somatic Mutations Without Matched Normal Samples Using Long Reads;2024-02-29