Inference and visualization of DNA damage patterns using a grade of membership model

Author:

Al-Asadi Hussein12,Dey Kushal K2,Novembre John13,Stephens Matthew23

Affiliation:

1. Committee on Evolutionary Biology, University of Chicago, Chicago, IL, USA

2. Department of Statistics, University of Chicago, Chicago, IL, USA

3. Department of Human Genetics, University of Chicago, Chicago, IL, USA

Abstract

Abstract Motivation Quality control plays a major role in the analysis of ancient DNA (aDNA). One key step in this quality control is assessment of DNA damage: aDNA contains unique signatures of DNA damage that distinguish it from modern DNA, and so analyses of damage patterns can help confirm that DNA sequences obtained are from endogenous aDNA rather than from modern contamination. Predominant signatures of DNA damage include a high frequency of cytosine to thymine substitutions (C-to-T) at the ends of fragments, and elevated rates of purines (A & G) before the 5′ strand-breaks. Existing QC procedures help assess damage by simply plotting for each sample, the C-to-T mismatch rate along the read and the composition of bases before the 5′ strand-breaks. Here we present a more flexible and comprehensive model-based approach to infer and visualize damage patterns in aDNA, implemented in an R package aRchaic. This approach is based on a ‘grade of membership’ model (also known as ‘admixture’ or ‘topic’ model) in which each sample has an estimated grade of membership in each of K damage profiles that are estimated from the data. Results We illustrate aRchaic on data from several aDNA studies and modern individuals from 1000 Genomes Project Consortium (2012). Here, aRchaic clearly distinguishes modern from ancient samples irrespective of DNA extraction, lab and sequencing protocols. Additionally, through an in-silico contamination experiment, we show that the aRchaic grades of membership reflect relative levels of exogenous modern contamination. Together, the outputs of aRchaic provide a concise visual summary of DNA damage patterns, as well as other processes generating mismatches in the data. Availability and implementation aRchaic is available for download from https://www.github.com/kkdey/aRchaic. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

National Institute of Health

National Science Foundation

NIH

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference41 articles.

1. An Integrated Map of Genetic Variation from 1, 092 Human Genomes;Nature,2012

2. Fast model-based estimation of ancestry in unrelated individuals;Alexander;Genome Res,2009

3. Population Genomics of Bronze Age Eurasia;Allentoft;Nature,2015

4. Latent dirichlet allocation;Blei;J. Mach. Learn. Res,2003

5. Patterns of damage in genomic DNA sequences from a neandertal;Briggs;Proc. Natl. Acad. Sci. USA,2007

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3