Identifying, understanding, and correcting technical artifacts on the sex chromosomes in next-generation sequencing data

Author:

Webster Timothy H12ORCID,Couse Madeline34,Grande Bruno M5ORCID,Karlins Eric6,Phung Tanya N7ORCID,Richmond Phillip A48ORCID,Whitford Whitney910ORCID,Wilson Melissa A111ORCID

Affiliation:

1. School of Life Sciences, Arizona State University, 427 E Tyler Mall, Tempe, AZ 85281, USA

2. Department of Anthropology, University of Utah, 260 S Central Drive, Carolyn and Kem Gardner Commons, Suite 4625, Salt Lake City, UT 84112, USA

3. University of British Columbia, 2329 West Mall, Vancouver, BC, V6T 1Z4, Canada

4. BC Children's Hospital Research Institute, 950 W 28th Avenue, Vancouver, BC, V5Z 4H4, Canada

5. Department of Molecular Biology and Biochemistry, Simon Fraser University, 8888 University Drive, Burnaby, BC, V5A 1S6, Canada

6. Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, 9609 Medical Center Drive, MSC 9776, Bethesda, MD 20892, USA

7. Interdepartmental Program in Bioinformatics, UCLA, 621 Charles E. Young Drive South, Los Angeles, CA 90095-1606, USA

8. Centre for Molecular Medicine and Therapeutics, University of British Columbia, 950 West 28th Avenue, Vancouver, BC, V52 4H4, Canada

9. School of Biological Sciences, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand

10. Centre for Brain Research, The University of Auckland, Private Bag 92019, Auckland 1142, New Zealand

11. Center for Evolution and Medicine, Arizona State University, 401 E. Tyler Mall, Tempe, AZ 85287, USA

Abstract

AbstractBackgroundMammalian X and Y chromosomes share a common evolutionary origin and retain regions of high sequence similarity. Similar sequence content can confound the mapping of short next-generation sequencing reads to a reference genome. It is therefore possible that the presence of both sex chromosomes in a reference genome can cause technical artifacts in genomic data and affect downstream analyses and applications. Understanding this problem is critical for medical genomics and population genomic inference.ResultsHere, we characterize how sequence homology can affect analyses on the sex chromosomes and present XYalign, a new tool that (1) facilitates the inference of sex chromosome complement from next-generation sequencing data; (2) corrects erroneous read mapping on the sex chromosomes; and (3) tabulates and visualizes important metrics for quality control such as mapping quality, sequencing depth, and allele balance. We find that sequence homology affects read mapping on the sex chromosomes and this has downstream effects on variant calling. However, we show that XYalign can correct mismapping, resulting in more accurate variant calling. We also show how metrics output by XYalign can be used to identify XX and XY individuals across diverse sequencing experiments, including low- and high-coverage whole-genome sequencing, and exome sequencing. Finally, we discuss how the flexibility of the XYalign framework can be leveraged for other uses including the identification of aneuploidy on the autosomes. XYalign is available open source under the GNU General Public License (version 3).ConclusionsSex chromsome sequence homology causes the mismapping of short reads, which in turn affects downstream analyses. XYalign provides a reproducible framework to correct mismapping and improve variant calling on the sex chromsomes.

Funder

Arizona State University

National Institute of General Medical Sciences

National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Computer Science Applications,Health Informatics

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3