ComHapDet: a spatial community detection algorithm for haplotype assembly-Reference-Cited by-同舟云学术

ComHapDet: a spatial community detection algorithm for haplotype assembly

Published:2020-09 Issue:S9 Volume:21 Page:
ISSN:1471-2164
Container-title:BMC Genomics
language:en
Short-container-title:BMC Genomics

Author:

Sankararaman Abishek,Vikalo Haris,Baccelli François

Abstract

Abstract Background Haplotypes, the ordered lists of single nucleotide variations that distinguish chromosomal sequences from their homologous pairs, may reveal an individual’s susceptibility to hereditary and complex diseases and affect how our bodies respond to therapeutic drugs. Reconstructing haplotypes of an individual from short sequencing reads is an NP-hard problem that becomes even more challenging in the case of polyploids. While increasing lengths of sequencing reads and insert sizes helps improve accuracy of reconstruction, it also exacerbates computational complexity of the haplotype assembly task. This has motivated the pursuit of algorithmic frameworks capable of accurate yet efficient assembly of haplotypes from high-throughput sequencing data. Results We propose a novel graphical representation of sequencing reads and pose the haplotype assembly problem as an instance of community detection on a spatial random graph. To this end, we construct a graph where each read is a node with an unknown community label associating the read with the haplotype it samples. Haplotype reconstruction can then be thought of as a two-step procedure: first, one recovers the community labels on the nodes (i.e., the reads), and then uses the estimated labels to assemble the haplotypes. Based on this observation, we propose – a novel assembly algorithm for diploid and ployploid haplotypes which allows both bialleleic and multi-allelic variants. Conclusions Performance of the proposed algorithm is benchmarked on simulated as well as experimental data obtained by sequencing Chromosome 5 of tetraploid biallelic Solanum-Tuberosum (Potato). The results demonstrate the efficacy of the proposed method and that it compares favorably with the existing techniques.

Publisher

Springer Science and Business Media LLC

Subject

Genetics,Biotechnology

Link

https://link.springer.com/content/pdf/10.1186/s12864-020-06935-x.pdf

Reference33 articles.

1. Clark AG. The role of haplotypes in candidate gene studies. Genet Epidemiol Off Pub Int Genet Epidemiol Soc. 2004; 27(4):321–33.

2. Sabeti PC, Reich DE, Higgins JM, Levine HZ, Richter DJ, Schaffner SF, Gabriel SB, Platko JV, Patterson NJ, McDonald GJ, et al. Detecting recent positive selection in the human genome from haplotype structure. Nature. 2002; 419(6909):832–7.

3. Consortium PGS, et al. Genome sequence and analysis of the tuber crop potato. Nature. 2011; 475(7355):189.

4. Lancia G, Bafna V, Istrail S, Lippert R, Schwartz R. SNPs problems, complexity, and algorithms. In: European Symposium on Algorithms. Springer: 2001. p. 182–93.

5. Duitama J, Huebsch T, McEwen G, Suk E-K, Hoehe MR. ReFHap: a reliable and fast algorithm for single individual haplotyping. In: Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology. ACM: 2010. p. 160–9.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. GCphase: an SNP phasing method using a graph partition and error correction algorithm;BMC Bioinformatics;2024-08-19

2. Pairwise comparative analysis of six haplotype assembly methods based on users’ experience;BMC Genomic Data;2023-06-29

3. Towards accurate, contiguous and complete alignment-based polyploid phasing algorithms;Genomics;2022-05

4. flopp: Extremely Fast Long-Read Polyploid Haplotype Phasing by Uniform Tree Partitioning;Journal of Computational Biology;2022-02-01

5. Genomics and functional genomics in Leishmania and Trypanosoma cruzi: statuses, challenges and perspectives;Memórias do Instituto Oswaldo Cruz;2021