Abstract
2.AbstractThe dominant paradigm for analysing genetic variation relies on a central idea: all genomes in a species can be described as minor differences from a single reference genome. However, this approach can be problematic or inadequate for bacteria, where there can be significant sequence divergence within a species.Reference graphs are an emerging solution to the reference bias issues implicit in the “single-reference” model. Such a graph represents variation at multiple scales within a population – e.g., nucleotide- and locus-level.The genetic causes of drug resistance in bacteria have proven comparatively easy to decode compared with studies of human diseases. For example, it is possible to predict resistance to numerous anti-tuberculosis drugs by simply testing for the presence of a list of single nucleotide polymorphisms and insertion/deletions, commonly referred to as a catalogue.We developed DrPRG (Drug resistance Prediction with Reference Graphs) using the bacterial reference graph method Pandora. First, we outline the construction of aMycobacterium tuberculosisdrug resistance reference graph, a process that can be replicated for other species. The graph is built from a global dataset of isolates with varying drug susceptibility profiles, thus capturing common and rare resistance- and susceptible-associated haplotypes. We benchmark DrPRG against the existing graph-based tool Mykrobe and the haplotype-based approach of TBProfiler using 44,709 and 138 publicly available Illumina and Nanopore samples with associated phenotypes. We find DrPRG has significantly improved sensitivity and specificity for some drugs compared to these tools, with no significant decreases. It uses significantly less computational memory than both tools, and provides significantly faster runtimes, except when runtime is compared to Mykrobe on Nanopore data.We discover and discuss novel insights into resistance-conferring variation forM. tuberculosis- including deletion of geneskatGandpncA– and suggest mutations that may warrant reclassification as associated with resistance.3.Impact statementMycobacterium tuberculosisis the bacterium responsible for tuberculosis (TB). TB is one of the leading causes of death worldwide; before the coronavirus pandemic it was the leading cause of death from a single pathogen. Drug-resistant TB incidence has recently increased, making the detection of resistance even more vital. In this study, we develop a new software tool to predict drug resistance from whole-genome sequence data of the pathogen using new reference graph models to represent a reference genome. We evaluate it onM. tuberculosisagainst existing tools for resistance prediction and show improved performance. Using our method, we discover new resistance-associated variations and discuss reclassification of a selection of existing mutations. As such, this work contributes to TB drug resistance diagnostic efforts. In addition, the method could be applied to any bacterial species, so is of interest to anyone working on antimicrobial resistance.4.Data summaryThe authors confirm all supporting data, code and protocols have been provided within the article or through supplementary data files.The software method presented in this work, DrPRG, is freely available from GitHub under an MIT license athttps://github.com/mbhall88/drprg. We used commit9492f25for all results via a Singularity[1] container from the URIdocker://quay.io/mbhall88/drprg:9492f25.All code used to generate results for this study are available on GitHub athttps://github.com/mbhall88/drprg-paper. All data used in this work are freely available from the SRA/ENA/DRA and a copy of the datasheet with all associated phenotype information can be downloaded from the archived repository athttps://doi.org/10.5281/zenodo.7819984or found in the previously mentioned GitHub repository.TheMycobacterium tuberculosisindex used in this work is available to download through DrPRG via the commanddrprg index --download mtb@20230308or from GitHub athttps://github.com/mbhall88/drprg-index.
Publisher
Cold Spring Harbor Laboratory