Abstract
AbstractThe inference of phylogenetic trees from sequence data has become a staple in evolutionary research. Bayesian inference of such trees is predominantly based on the Metropolis-Hastings algorithm. For high dimensional and correlated data this algorithm is known to be inefficient. There are gradient based algorithms to speed up such inference. Building on recent research which uses gradient based approaches for the inference of phylogenetic trees in a Bayesian framework, I present an algorithm which is capable of performing No-U-Turn sampling for phylogenetic trees. As an extension to Hamiltonian Monte Carlo methods, No-U-Turn sampling comes with the same benefits, such as proposing distant new states with a high acceptance probability, but eliminates the need to manually tune hyper parameters. Evaluated on real data sets, the new sampler shows that it converges faster to the target distribution. The results also indicate that a higher number of topologies are traversed during sampling by the new algorithm in comparison to traditional Markov Chain Monte Carlo approaches. This new algorithm leads to a more efficient exploration of the posterior distribution of phylogenetic tree topologies.Author summaryPhylogentic trees are important for our understanding of evolutionary relationships. But even for only a small number of entities the number of possible trees is immense. In order to efficiently search through this large space and analyze the distributions of these trees, different algorithmic solutions have been proposed. A phylogenetic tree is not only defined by the way it groups the entities but also by the length of the branches. This nature of the phylogentic trees complicates the exploration of these distributions. Building on research and algorithmic ideas for non tree like data and recent developments on the mathematics of trees, I propose a new method which is able to traverse the space of possible trees more rapidly. By using a search strategy which is guided by the data and the underlying evolutionary model, the algorithm is able to better sample from the desired distribution. I am able to show that the new algorithm can indeed analyze the correct distribution but can do so with a higher efficiency than other algorithms.
Publisher
Cold Spring Harbor Laboratory