Abstract
AbstractNext-generation sequencing (NGS) has been widely used for calling biological variants. The gold-standard methodology for accessing the ability of a computational method to call a specific variant is to perform NGS wet-lab experiments on samples known to harbor this variant. Nevertheless, wet-lab experiments are both labor-intensive and time-consuming, and rare variants may not be present in a sample of population. Moreover, these two issues are exacerbated in SafeSeqS which enabled liquid biopsy and minimum-residual disease (MRD) detection with cell-free DNA by using unique molecular identifier (UMI) to detect and/or correct NGS error. Hence, we developed the first UMI-aware NGS small-variant simulator named SafeMut which also considered the overdispersion of allele fraction. We used the tumor-normal paired sequencing runs from the SEQC2 somatic reference sets and cell-free DNA data sets to assess the performance of BamSurgeon, VarBen, and SafeMut. We observed that, unlike BamSurgeon and VarBen, the allele-fraction distribution of the variants simulated by SafeMut closely resembles such distribution generated by technical replicates of wet-lab experiments. SafeMut is able to provide accurate simulation of small variants in NGS data, thereby helping with the assessment of the ability to call these variants in a bioinformatics pipeline.
Publisher
Cold Spring Harbor Laboratory