Abstract
AbstractPlasmids are a key vector of antibiotic resistance, but the current bioinformatics toolkit is not well suited to tracking them. The rapid structural changes seen in plasmid genomes present considerable challenges to evolutionary and epidemiological analysis. Typical approaches are either low resolution (replicon typing) or use shared k-mer content to define a genetic distance. However this distance can both overestimate plasmid relatedness by ignoring rearrangements, and underestimate by over-penalising gene gain/loss. Therefore a model is needed which captures the key components of how plasmid genomes evolve structurally – through gene/block gain or loss, and rearrangement. A secondary requirement is to prevent promiscuous transposable elements (TEs) leading to over-clustering of unrelated plasmids. We choose the “Double Cut and Join Indel” model, in which plasmids are studied at a coarse level, as a sequence of signed integers (representing genes or aligned blocks), and the distance between two plasmids is the minimum number of rearrangement events or indels needed to transform one into the other. We show how this gives much more meaningful distances between plasmids. We introduce a software workflowpling(https://github.com/iqbal-lab-org/pling), which uses the DCJ-Indel model, to calculate distances between plasmids and then cluster them. In our approach, we combine containment distances and DCJ-Indel distances to build a TE-aware plasmid network. We demonstrate superior performance and interpretability to other plasmid clustering tools on the “Russian Doll” dataset and a hospital transmission dataset.Impact statementStudying plasmid transmission is a necessary component of understanding antibiotic resistance spread, but identifying recently related plasmids is difficult and often requires manual curation. Pling simplifies this by leveraging a combination of containment distances and rearrangement distances to cluster plasmids. The outcome are clusters of recently related plasmids with a clear backbone and relatively large core genomes, in contrast to other tools which sometimes overcluster. Additionally the network constructed by pling provides a framework with which to spot evolutionary events, such as potential fusions of plasmids and spread of transposable elements.Data summarySupplementary information and figures are available as an additional PDF.The tool presented in this paper is available underhttps://github.com/iqbal-lab-org/pling. Additional computational analysis and scripts are described and provided underhttps://github.com/babayagaofficial/pling_paper_analyses. The sequence data used can be found under BioProject no. PRJNA246471 in the National Center for Biotechnology Information for the “Russian doll” dataset (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA246471), and under Project no.PRJEB31034 in European Nucleotide Archive for the “Addenbrookes” dataset (https://www.ebi.ac.uk/ena/browser/view/PRJEB30134). All other genome sequences used were sourced from PLSDB (https://ccb-microbe.cs.uni-saarland.de/plsdb/), and lists of accession numbers can be found in the additional analysis github.
Publisher
Cold Spring Harbor Laboratory
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献