Nucleotide-resolution bacterial pan-genomics with reference graphs-Reference-Cited by-同舟云学术

Nucleotide-resolution bacterial pan-genomics with reference graphs

Published:2020-11-12 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Colquhoun Rachel M^ORCID,Hall Michael B^ORCID,Lima Leandro^ORCID,Roberts Leah W^ORCID,Malone Kerri M^ORCID,Hunt Martin^ORCID,Letcher Brice^ORCID,Hawkey Jane^ORCID,George Sophie,Pankhurst Louise^ORCID,Iqbal Zamin^ORCID

Abstract

AbstractBackgroundBacterial genomes follow a U-shaped frequency distribution whereby most genomic loci are either rare (accessory) or common (core); the union of these is the pan-genome. The alignable fraction of two genomes from a single species can be low (e.g. 50-70%), such that no single reference genome can access all single nucleotide polymorphisms (SNPs). The pragmatic solution is to choose a close reference, and analyse SNPs only in the core genome. Given much bacterial adaptability hinges on the accessory genome, this is an unsatisfactory limitation.ResultsWe present a novel pan-genome graph structure and algorithms implemented in the software pandora, which approximates a sequenced genome as a recombinant of reference genomes, detects novel variation and then pan-genotypes multiple samples. The method takes fastq as input and outputs a multi-sample VCF with respect to an inferred data-dependent reference genome, and is available at https://github.com/rmcolq/pandora.Constructing a reference graph from 578 E. coli genomes, we analyse a diverse set of 20 E. coli isolates. We show pandora recovers at least 13k more rare SNPs than single-reference based tools, achieves equal or better error rates with Nanopore as with Illumina data, 6-24x lower Nanopore error rates than other tools, and provides a stable framework for analysing diverse samples without reference bias. We also show that our inferred recombinant VCF reference genome is significantly better than simply picking the closest RefSeq reference.ConclusionsThis is a step towards comprehensive cohort analysis of bacterial pan-genomic variation, with potential impacts on genotype/phenotype and epidemiological studies.

Publisher

Cold Spring Harbor Laboratory

Reference75 articles.

1. Genetic drift, selection and the evolution of the mutation rate

2. Impact of recombination on bacterial evolution

3. Neutral Theory, Microbial Practice: Challenges in Bacterial Population Genetics

4. The Bacterial Species Challenge: Making Sense of Genetic and Ecological Diversity

5. Deletional bias and the evolution of bacterial genomes

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Gramtools enables multiscale variation analysis with genome graphs;Genome Biology;2021-09-06

2. Methods and Developments in Graphical Pangenomics;Journal of the Indian Institute of Science;2021-07

3. Simplitigs as an efficient and scalable representation of de Bruijn graphs;Genome Biology;2021-04-06

4. Enabling multiscale variation analysis with genome graphs;2021-02-03

5. Extraintestinal pathogenic Escherichia coli (ExPEC) are associated with prolonged carriage of extended-spectrum β-lactamase-producing E. coli acquired during travel;2020-09-24