SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples-Reference-Cited by-同舟云学术

SNP detection and genotyping from low-coverage sequencing data on multiple diploid samples

Published:2010-10-27 Issue:6 Volume:21 Page:952-960
ISSN:1088-9051
Container-title:Genome Research
language:en
Short-container-title:Genome Res.

Author:

Le Si Quang,Durbin Richard

Abstract

Reductions in the cost of sequencing have enabled whole-genome sequencing to identify sequence variants segregating in a population. An efficient approach is to sequence many samples at low coverage, then to combine data across samples to detect shared variants. Here, we present methods to discover and genotype single-nucleotide polymorphism (SNP) sites from low-coverage sequencing data, making use of shared haplotype (linkage disequilibrium) information. For each population, we first collect SNP candidates based on independent sequence calls per site. We then use MARGARITA with genotype or phased haplotype data from the same samples to collect 20 ancestral recombination graphs (ARGs). We refine the posterior probability of SNP candidates by considering possible mutations at internal branches of the 40 marginal ancestral trees inferred from the 20 ARGs at the left and right flanking genotype sites. Using a population genetic prior distribution on tree-branch length and Bayesian inference, we determine a posterior probability of the SNP being real and also the most probable phased genotype call for each individual. We present experiments on both simulation data and real data from the 1000 Genomes Project to prove the applicability of the methods. We also explore the relative tradeoff between sequencing depth and the number of sequenced samples.

Publisher

Cold Spring Harbor Laboratory

Subject

Genetics(clinical),Genetics

Reference20 articles.

1. A map of human genome variation from population-scale sequencing

2. Albers CA , Lunter G , MacArthur DG , McVean G , Ouwehand WH , Durbin R . 2011. Dindel: Accurate indel calls from short-read data. Genome Res (this issue). doi: 10.1101/gr.112326.110

3. Simultaneous Genotype Calling and Haplotype Phasing Improves Genotype Accuracy and Reduces False-Positive Associations for Genome-wide Association Studies

4. Fast and flexible simulation of DNA sequence data

5. A Flexible and Accurate Genotype Imputation Method for the Next Generation of Genome-Wide Association Studies

Cited by 129 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. High-throughput and cost-effective genotyping by low-coverage whole genome sequencing with genotype imputation in Pacific oyster, Crassostrea gigas;Aquaculture;2024-10

2. From sub-Saharan Africa to China: Evolutionary history and adaptation of Drosophila melanogaster revealed by population genomics;Science Advances;2024-04-19

3. Development and evaluation of a haplotype reference panel of Zhikong scallop (Chlamys farreri) for genotype imputation;Aquaculture;2024-03

4. Unified Multi-caller Ensemble (UME) generates an unbiased maize haplotype map for variable coverage whole genome data;2023-12-08

5. Efficient Two-Stage Analysis for Complex Trait Association with Arbitrary Depth Sequencing Data;Stats;2023-03-19