A sequential Monte Carlo framework for haplotype inference in CNV/SNP genotype data
-
Published:2014-04-24
Issue:1
Volume:2014
Page:
-
ISSN:1687-4153
-
Container-title:EURASIP Journal on Bioinformatics and Systems Biology
-
language:en
-
Short-container-title:J Bioinform Sys Biology
Author:
Iliadis Alexandros,Anastassiou Dimitris,Wang Xiaodong
Abstract
Abstract
Copy number variations (CNVs) are abundant in the human genome. They have been associated with complex traits in genome-wide association studies (GWAS) and expected to continue playing an important role in identifying the etiology of disease phenotypes. As a result of current high throughput whole-genome single-nucleotide polymorphism (SNP) arrays, we currently have datasets that simultaneously have integer copy numbers in CNV regions as well as SNP genotypes. At the same time, haplotypes that have been shown to offer advantages over genotypes in identifying disease traits even though available for SNP genotypes are largely not available for CNV/SNP data due to insufficient computational tools. We introduce a new framework for inferring haplotypes in CNV/SNP data using a sequential Monte Carlo sampling scheme ‘Tree-Based Deterministic Sampling CNV’ (TDSCNV). We compare our method with polyHap(v2.0), the only currently available software able to perform inference in CNV/SNP genotypes, on datasets of varying number of markers. We have found that both algorithms show similar accuracy but TDSCNV is an order of magnitude faster while scaling linearly with the number of markers and number of individuals and thus could be the method of choice for haplotype inference in such datasets. Our method is implemented in the TDSCNV package which is available for download at http://www.ee.columbia.edu/~anastas/tdscnv.
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,Computer Science Applications,General Biochemistry, Genetics and Molecular Biology
Reference25 articles.
1. Conrad DF, Hurles ME: The population genetics of structural variation. Nat Genet 2007,39(7 Suppl):S30-S36. 2. Conrad DF, Pinto D, Redon R, Feuk L, Gokcumen O, Zhang Y, Aerts J, Andrews TD, Barnes C, Campbell P, Fitzgerald T, Hu M, Ihm CH, Kristiansson K, Macarthur DG, Macdonald RJ, Onyiah I, Pang AW, Robson S, Stirrups K, Valsesia A, Walter K, Wei J, Tyler-Smith C, Carter NP, Lee C, Scherer SW, Hurles ME, The Wellcome Trust Case Control Consortium: Origins and functional impact of copy number variation in the human genome. Nature 2010,464(7289):704-712. 10.1038/nature08516 3. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, Fiegler H, Shapero MH, Carson AR, Chen W, Cho EK, Dallaire S, Freeman JL, González JR, Gratacòs M, Huang J, Kalaitzopoulos D, Komura D, MacDonald JR, Marshall CR, Mei R, Montgomery L, Nishimura K, Okamura K, Shen F, Somerville MJ, Tchinda J, Valsesia A, Woodwark C, Yang F, et al.: Global variation in copy number in the human genome. Nature 2006,444(7118):444-454. 10.1038/nature05329 4. McCarroll SA, Altshuler DM: Copy-number variation and association studies of human disease. Nat Genet 2007,39(7 Suppl):S37-S42. 5. Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, Ackerman HC, Campbell SJ, Altshuler D, Cooper R, Kwiatkowski D, Ward R, Lander ES: Detecting recent positive selection in the human genome from haplotype structure. Nature 2002,419(6909):832-837. 10.1038/nature01140
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|