A complete pipeline enables haplotyping and phasing macrohaplotype in long sequencing reads for polyploidy samples and a multi‐source DNA mixture

Author:

Wang Xuewen1,Muenzler Melissa1,King Jonathan1,Liu Muyi1,Li Hongmin2,Budowle Bruce34,Ge Jianye1ORCID

Affiliation:

1. Health Science Center University of North Texas Fort Worth Texas USA

2. College of Science Cal State East Bay Hayward California USA

3. Department of Forensic Medicine University of Helsinki Helsinki Finland

4. Forensic Science Institute Radford University Radford Virginia USA

Abstract

AbstractMacrohaplotype combines multiple types of phased DNA variants, increasing forensic discrimination power. High‐quality long‐sequencing reads, for example, PacBio HiFi reads, provide data to detect macrohaplotypes in multiploidy and DNA mixtures. However, the bioinformatics tools for detecting macrohaplotypes are lacking. In this study, we developed a bioinformatics software, MacroHapCaller, in which targeted loci (i.e., short TRs [STRs], single nucleotide polymorphisms, and insertion and deletions) are genotyped and combined with novel algorithms to call macrohaplotypes from long reads. MacroHapCaller uses physical phasing (i.e., read‐backed phasing) to identify macrohaplotypes, and thus it can detect multi‐allelic macrohaplotypes for a given sample. MacroHapCaller was validated with data generated from our designed targeted PacBio HiFi sequencing pipeline, which sequenced ∼8‐kb amplicon regions harboring 20 core forensic STR loci in human benchmark samples HG002 and HG003. MacroHapCaller also was validated in whole‐genome long‐read sequencing data. Robust and accurate genotyping and phased macrohaplotypes were obtained with MacroHapCaller compared with the known ground truth. MacroHapCaller achieved a higher or consistent genotyping accuracy and faster speed than existing tools HipSTR and DeepVar. MacroHapCaller enables efficient macrohaplotype analysis from high‐throughput sequencing data and supports applications using discriminating macrohaplotypes.

Funder

National Institute of Justice

Office of Justice Programs

U.S. Department of Justice

Publisher

Wiley

Reference24 articles.

1. fbi.gov. CODIS.https://www.fbi.gov/services/laboratory/biometric‐analysis/codis/codis‐and‐ndis‐fact‐sheet. Accessed 2 August 2022.

2. The forensic genomics toolbox is expanding

3. Pairwise kinship testing with a combination of STR and SNP loci

4. Enhanced mixture interpretation with macrohaplotypes based on long-read DNA sequencing

5. A draft human pangenome reference

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3