Abstract
AbstractThe field of phylogeography has evolved rapidly in terms of the analytical toolkit to analyze the ever-increasing amounts of genomic data. Despite substantial advances, researchers have not fully explored all potential analytical tools to tackle the challenge posed by the huge size of genomic datasets. For example, deep learning techniques, such as convolutional neural networks (CNNs), widely employed in image and video classification, are largely unexplored for phylogeographic model selection. In non-model organisms, the lack of information about their ecology, natural history, and evolution can lead to uncertainty about which set of demographic models should be considered. Here we investigate the utility of CNNs for assessing a large number of competing phylogeographic models using South American lizards as an example, and approximate Bayesian computation (ABC) to contrast the performance of CNNs. First, we evaluated three demographic scenarios (constant, expansion, and bottleneck) for each of four recovered lineages and found that the overall model accuracy was higher than 98% for all lineages. Next, we evaluated a set of 26 models that accounted for evolutionary relationships, gene flow, and changes in effective population size among these lineages and recovered an overall accuracy of 87%. In contrast, ABC was unable to single out a best fit model among 26 competing models. Finally, we used the CNN model to investigate the evolutionary history of two South American lizards. Our results indicate the presence of hidden genetic diversity, gene flow between non-sister populations, and changes in effective population sizes through time, likely in response to Pleistocene climatic oscillations. Our results demonstrate that CNNs can be easily and usefully incorporated into the phylogeographer’s toolkit.
Publisher
Cold Spring Harbor Laboratory