Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions
-
Published:2010-09-15
Issue:1
Volume:11
Page:
-
ISSN:1471-2105
-
Container-title:BMC Bioinformatics
-
language:en
-
Short-container-title:BMC Bioinformatics
Author:
Laing Chad,Buchanan Cody,Taboada Eduardo N,Zhang Yongxiang,Kropinski Andrew,Villegas Andre,Thomas James E,Gannon Victor PJ
Abstract
Abstract
Background
The pan-genome of a bacterial species consists of a core and an accessory gene pool. The accessory genome is thought to be an important source of genetic variability in bacterial populations and is gained through lateral gene transfer, allowing subpopulations of bacteria to better adapt to specific niches. Low-cost and high-throughput sequencing platforms have created an exponential increase in genome sequence data and an opportunity to study the pan-genomes of many bacterial species. In this study, we describe a new online pan-genome sequence analysis program, Panseq.
Results
Panseq was used to identify Escherichia coli O157:H7 and E. coli K-12 genomic islands. Within a population of 60 E. coli O157:H7 strains, the existence of 65 accessory genomic regions identified by Panseq analysis was confirmed by PCR. The accessory genome and binary presence/absence data, and core genome and single nucleotide polymorphisms (SNPs) of six L. monocytogenes strains were extracted with Panseq and hierarchically clustered and visualized. The nucleotide core and binary accessory data were also used to construct maximum parsimony (MP) trees, which were compared to the MP tree generated by multi-locus sequence typing (MLST). The topology of the accessory and core trees was identical but differed from the tree produced using seven MLST loci. The Loci Selector module found the most variable and discriminatory combinations of four loci within a 100 loci set among 10 strains in 1 s, compared to the 449 s required to exhaustively search for all possible combinations; it also found the most discriminatory 20 loci from a 96 loci E. coli O157:H7 SNP dataset.
Conclusion
Panseq determines the core and accessory regions among a collection of genomic sequences based on user-defined parameters. It readily extracts regions unique to a genome or group of genomes, identifies SNPs within shared core genomic regions, constructs files for use in phylogeny programs based on both the presence/absence of accessory regions and SNPs within core regions and produces a graphical overview of the output. Panseq also includes a loci selector that calculates the most variable and discriminatory loci among sets of accessory loci or core gene SNPs.
Availability
Panseq is freely available online at http://76.70.11.198/panseq. Panseq is written in Perl.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference49 articles.
1. Ansorge WJ: Next-generation DNA sequencing techniques. New Biotechnology 2009, 25: 195–203. 10.1016/j.nbt.2008.12.009 2. Li H, Ruan J, Durbin R: Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 2008, 18: 1851–1858. 10.1101/gr.078212.108 3. MacLean D, Jones JDG, Studholme DJ: Application of 'next-generation' sequencing technologies to microbial genetics. Nat Rev Microbiol 2009, 7: 287–296. 4. Stiens M, Becker A, Bekel T, Gödde V, Goesmann A, Niehaus K, Schneiker-Bekel S, Selbitschka W, Weidner S, Schlüter A, Pühler A: Comparative genomic hybridisation and ultrafast pyrosequencing revealed remarkable differences between the Sinorhizobium meliloti genomes of the model strain Rm1021 and the field isolate SM11. J Biotechnol 2008, 136: 31–37. 10.1016/j.jbiotec.2008.04.014 5. Tettelin H, Masignani V, Cieslewicz MJ, Donati C, Medini D, Ward NL, Angiuoli SV, Crabtree J, Jones AL, Durkin AS, Deboy RT, Davidsen TM, Mora M, Scarselli M, Margarity Ros I, Peterson JD, Hauser CR, Sundaram JP, Nelson WC, Madupu R, Brinkac LM, Dodson RJ, Rosovitz MJ, Sullivan SA, Daugherty SC, Haft DH, Selengut J, Gwinn ML, Zhou L, Zafar N, Khouri H, Radune D, Dimitrov G, Watkins K, O'Connor KJB, Smith S, Utterback TR, White O, Rubens CE, Grandi G, Madoff LC, Kasper DL, Telford JL, Wessels MR, Rappuoli R, Fraser CM: Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae : implications for the microbial "pan-genome". Proc Natl Acad Sci USA 2005, 102: 13950–13955. 10.1073/pnas.0506758102
Cited by
233 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|