Author:
Hiltemann Saskia,Jenster Guido,Trapman Jan,van der Spek Peter,Stubbs Andrew
Abstract
Tumor analyses commonly employ a correction with a matched normal (MN), a sample from healthy tissue of the same individual, in order to distinguish germline mutations from somatic mutations. Since the majority of variants found in an individual are thought to be common within the population, we constructed a set of 931 samples from healthy, unrelated individuals, originating from two different sequencing platforms, to serve as a virtual normal (VN) in the absence of such an associated normal sample. Our approach removed (1) >96% of the germline variants also removed by the MN sample and (2) a large number (2%–8%) of additional variants not corrected for by the associated normal. The combination of the VN with the MN improved the correction for polymorphisms significantly, with up to ∼30% compared with MN and ∼15% compared with VN only. We determined the number of unrelated genomes needed in order to correct at least as efficiently as the MN is about 200 for structural variations (SVs) and about 400 for single-nucleotide variants (SNVs) and indels. In addition, we propose that the removal of common variants with purely position-based methods is inaccurate and incurs additional false-positive somatic variants, and more sophisticated algorithms, which are capable of leveraging information about the area surrounding variants, are needed for optimal accuracy. Our VN correction method can be used to analyze any list of variants, regardless of sequencing platform of origin. This VN methodology is available for use on our public Galaxy server.
Funder
Center for Translational Molecular Medicine
TraIT project
Netherlands Organization for Scientific Research
Biobanking and Biomolecular Research Infrastructure Netherlands
Publisher
Cold Spring Harbor Laboratory
Subject
Genetics (clinical),Genetics
Reference24 articles.
1. A map of human genome variation from population-scale sequencing
2. An integrated map of genetic variation from 1,092 human genomes
3. Galaxy: a web-based genome analysis tool for experimentalists;Curr Protoc Mol Biol,2010
4. Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization;Bioinformatics,2010
5. The Genome of the Netherlands: design, and project goals;Eur J Hum Genet,2013
Cited by
64 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献