Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices-Reference-Cited by-同舟云学术

Machine learning based imputation techniques for estimating phylogenetic trees from incomplete distance matrices

Published:2019-08-22 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Bhattacharjee Ananya,Bayzid Md. Shamsuzzoha

Abstract

AbstractBackgroundDue to the recent advances in sequencing technologies and species tree estimation methods capable of taking gene tree discordance into account, notable progress has been achieved in constructing large scale phylogenetic trees from genome wide data. However, substantial challenges remain in leveraging this huge amount of molecular data. One of the foremost among these challenges is the need for efficient tools that can handle missing data. Popular distance-based methods such as neighbor joining and UPGMA require that the input distance matrix does not contain any missing values.ResultsWe introduce two highly accurate machine learning based distance imputation techniques. One of our approaches is based on matrix factorization, and the other one is an autoencoder based deep learning technique. We evaluate these two techniques on a collection of simulated and biological datasets, and show that our techniques match or improve upon the best alternate techniques for distance imputation. Moreover, our proposed techniques can handle substantial amount of missing data, to the extent where the best alternate methods fail.ConclusionsThis study shows for the first time the power and feasibility of applying deep learning techniques for imputing distance matrices. The autoencoder based deep learning technique is highly accurate and scalable to large dataset. We have made these techniques freely available as a cross-platform software (available at https://github.com/Ananya-Bhattacharjee/ImputeDistances).

Publisher

Cold Spring Harbor Laboratory

Reference79 articles.

1. Felsenstein, J. : Inferring Phylogenies vol. 2, (2004)

2. BEAST: Bayesian evolutionary analysis by sampling trees;BMC Evolutionary Biology,2007

3. STEM: species tree estimation using maximum likelihood for gene trees under coalescence

4. Estimating Species Phylogenies Using Coalescence Times among Sequences

5. BUCKy: Gene tree/species tree reconciliation with Bayesian concordance analysis

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A systematic review of machine learning-based missing value imputation techniques;Data Technologies and Applications;2021-04-02