Abstract
Gene flow between species is increasingly recognized as an important evolutionary process with potential adaptive consequences. Recent methodological advances make it possible to infer different modes of gene flow from genome-scale data, including pulse introgression at a specific time and continuous gene flow over an extended time period. However, it remains challenging to infer the history of species divergence and between-species gene flow from genomic sequence data. As a result, models used in real data analysis may often be misspecified, potentially leading to incorrect biological interpretations. Here, we characterize biases in parameter estimation under continuous migration models using a combination of asymptotic analysis and posterior inference from simulated datasets. When sequence data are generated under a pulse introgression model, isolation-with-initial-migration models assuming no recent gene flow are able to better recover gene flow with less bias than models that assume recent gene flow. When gene flow is assigned to an incorrect branch in the phylogeny, there may be large biases associated with the migration rate and species divergence times. When the direction of gene flow is incorrectly assumed, we may still detect gene flow if it is recent and between non-sister species but not when it is ancestral and between sister species. Overall, the impact of model misspecification is local in the species phylogeny. The pulse introgression model appears to be more robust to model misspecification and is preferable in real data analysis over the continuous migration model unless there is substantive evidence for continuous gene flow.
Publisher
Cold Spring Harbor Laboratory