Abstract
AbstractAncient whole-genome duplication--previous genome duplication events that have since been eroded via diploidization, are increasingly identified throughout eukaryotes. One of the constraints against large-scale studies of ancient eukaryotic WGD is the relatively large, high-quality datasets often needed to definitively establish ancient WGD events; alternatively, the more low-input method interpretation of genome-wide synonymous substitution rates (Ks plots) is prone to bias and inconsistency. We improve upon the shortcomings of the current Ks plot method by building a Ks plot simulator. This data-agnostic approach simulates common distributions found in Ks plots in the presence or absence of ancient WGD signatures. In conjunction with a machine-learning classifier, this approach can quickly assess the likelihood that transcriptomic and genomic data bear WGD signatures. On independently-generated synthetic data and real plant transcriptomic data, SLEDGE is capable of correctly identifying ancient WGD in 93-100% of samples. This approach can serve as a quick classification step in large-scale genomic analyses, identifying putative ancient polyploids for further study.
Publisher
Cold Spring Harbor Laboratory
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献