An enrichment method for mapping ambiguous reads to the reference genome for NGS analysis-Reference-Cited by-同舟云学术

An enrichment method for mapping ambiguous reads to the reference genome for NGS analysis

Published:2019-12 Issue:06 Volume:17 Page:1940012
ISSN:0219-7200
Container-title:Journal of Bioinformatics and Computational Biology
language:en
Short-container-title:J. Bioinform. Comput. Biol.

Author:

Liu Yuan¹,Ma Yongchao¹,Salsman Evan²,Manthey Frank A.²,Elias Elias M.²,Li Xuehui²,Yan Changhui¹

Affiliation:

1. Department of Computer Science, North Dakota State University, Fargo, North Dakota 58102, USA

2. Department of Plant Sciences, North Dakota State University, Fargo, North Dakota 58102, USA

Abstract

Mapping short reads to a reference genome is an essential step in many next-generation sequencing (NGS) analyses. In plants with large genomes, a large fraction of the reads can align to multiple locations of the genome with equally good alignment scores. How to map these ambiguous reads to the genome is a challenging problem with big impacts on the downstream analysis. Traditionally, the default method is to assign an ambiguous read randomly to one of the many potential locations. In this study, we explore two alternative methods that are based on the hypothesis that the possibility of an ambiguous read being generated by a location is proportional to the total number of reads produced by that location: (1) the enrichment method that assigns an ambiguous read to the location that has produced the most reads among all the potential locations, (2) the probability method that assigns an ambiguous read to a location based on a probability proportional to the number of reads the location produces. We systematically compared the performance of the proposed methods with that of the default random method. Our results showed that the enrichment method produced better results than the default random method and the probability method in the discovery of single nucleotide polymorphisms (SNPs). Not only did it produce more SNP markers, but it also produced SNP markers with better quality, which was demonstrated using multiple mainstay genomic analyses, including genome-wide association studies (GWAS), minor allele distribution, population structure, and genomic prediction.

Publisher

World Scientific Pub Co Pte Lt

Subject

Computer Science Applications,Molecular Biology,Biochemistry

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0219720019400122

Reference26 articles.

1. Repetitive DNA and next-generation sequencing: computational challenges and solutions

2. Sequencing technologies — the next generation

3. Identification, characterization and interpretation of single-nucleotide sequence variation in allopolyploid crop species

4. Genotyping‐by‐Sequencing for Plant Breeding and Genetics

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Disregarding multimappers leads to biases in the functional assessment of NGS data;BMC Genomics;2024-05-08

2. Sequence deeper without sequencing more: Bayesian resolution of ambiguously mapped reads;PLOS Computational Biology;2021-04-19

3. Foreword: Special Issue for the 2019 International Conference on Bioinformatics and Computational Biology (BICOB-2019);Journal of Bioinformatics and Computational Biology;2019-12