VCFcontam: A Machine Learning Approach to Estimate Cross-Sample Contamination from Variant Call Data

Author:

McCartney-Melstad EvanORCID,Bi Ke,Han James,Foo Catherine K.

Abstract

AbstractThe quality of genotyping calls resulting from DNA sequencing is reliant on high quality starting genetic material. One factor that can reduce sample quality and lead to misleading genotyping results is genetic contamination of a sample by another source, such as cells or DNA from another sample of the same or different species. Cross-sample contamination by individuals of the same species is particularly difficult to detect in DNA sequencing data, because the contaminating sequence reads look very similar to those of the intended base sample. We introduce a new method that uses a support vector regression model trained on in silico contaminated datasets to predict empirical contamination using a collection of variables drawn from VCF files, including the fraction of sites that are heterozygous, the fraction of heterozygous sites with imbalanced allele counts, and parameters describing distributions fit to heterozygous allele fractions in a sample. We use the method described here to train a model that can accurately predict the extent of cross-sample contamination within 1% of the actual fraction, for simulated contaminated samples in the 0-5% contamination range, directly from the VCF file.DefinitionsLesser alleleThe allele in a heterozygous position that received less sequencing read support (which may be either the REF or ALT allele).Lesser allele fraction (LAF)The number of sequencing reads supporting the less frequently observed allele divided by the sum of reads supporting both alleles in the genotype at a given genomic position.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3