A Convolutional Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction-Reference-Cited by-同舟云学术

A Convolutional Auto-Encoder for Haplotype Assembly and Viral Quasispecies Reconstruction

Published:2020-10-01 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Ke Ziqi,Vikalo Haris

Abstract

AbstractHaplotype assembly and viral quasispecies reconstruction are challenging tasks concerned with analysis of genomic mixtures using sequencing data. High-throughput sequencing technologies generate enormous amounts of short fragments (reads) which essentially oversample components of a mixture; the representation redundancy enables reconstruction of the components (haplotypes, viral strains). The reconstruction problem, known to be NP-hard, boils down to grouping together reads originating from the same component in a mixture. Existing methods struggle to solve this problem with required level of accuracy and low runtimes; the problem is becoming increasingly more challenging as the number and length of the components increase. This paper proposes a read clustering method based on a convolutional auto-encoder designed to first project sequenced fragments to a low-dimensional space and then estimate the probability of the read origin using learned embedded features. The components are reconstructed by finding consensus sequences that agglomerate reads from the same origin. Mini-batch stochastic gradient descent and dimension reduction of reads allow the proposed method to efficiently deal with massive numbers of long reads. Experiments on simulated, semi-experimental and experimental data demonstrate the ability of the proposed method to accurately reconstruct haplotypes and viral quasispecies, often demonstrating superior performance compared to state-of-the-art methods.

Publisher

Cold Spring Harbor Laboratory

Reference42 articles.

1. HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data

2. Ahn, S. ; and Vikalo, H. 2017. abayesqr: A bayesian method for reconstruction of viral populations characterized by low diversity. International Conference on Research in Computational Molecular Biology 353–369.

3. Viral quasispecies reconstruction via tensor factorization with successive read removal

4. MTML-msBayes: Approximate Bayesian comparative phylogeographic inference from multiple taxa and multiple loci with rate heterogeneity

5. An MCMC algorithm for haplotype assembly from whole-genome sequence data

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Human DNA/RNA motif mining using deep-learning methods: a scoping review;Network Modeling Analysis in Health Informatics and Bioinformatics;2023-04-12

2. VStrains: De Novo Reconstruction of Viral Strains via Iterative Path Extraction from Assembly Graphs;Lecture Notes in Computer Science;2023

3. Reconstruction of microbial haplotypes by integration of statistical and physical linkage in scaffolding;2020-03-30