Retrieving Good-Quality Salmonella Genomes From the GenBank Database Using a Python Tool, SalmoDEST

Author:

Cherchame Emeline12ORCID,Ilango Guy13,Cadel-Six Sabrina1ORCID

Affiliation:

1. Salmonella and Listeria Unit, Laboratory for Food Safety, ANSES, Maisons-Alfort, France

2. Paris Brain Institute (ICM), Paris, France

3. Research Center for Respiratory Diseases, INSERM UMR 1100, Tours, France

Abstract

With the advent of next-generation whole-genome sequencing (WGS), the need for good-quality and well-characterised Salmonella genomes has increased over the past years. Good-quality complete genomes are often required for assembly reference mapping or phylogenetic single nucleotide polymorphism (SNP) analysis. Complete genomes or contigs from specific sources or serovars are also searched for clustering analysis or source attribution studies. Therefore, new bioinformatics tools are needed for the extraction of good-quality and well-characterised genomes from public databases. Here, we developed SalmoDEST, an open-source Python tool capable of extracting Salmonella genomes with a coverage higher than 50x and genome length over 4Mb from the GenBank database in the form of complete genomes or contigs, with verification of the serovar to which they belong and identification of the corresponding multi locus sequence type (MLST) profile. To validate the ability to SalmoDEST to screen for and retrieve genomes of good quality, we compared our results for S. Typhi complete genome with those available in the literature and extracted Salmonella genomes from bovine sources strains isolated worldwide. Finally, we provide in this study a list of 239 complete genomes for 123 serovars of Salmonella of high quality. SalmoDEST is a handy and easy-to-use open-source tool to extract complete genomes or contigs that can be routinely used in public health, food safety and research laboratories. SalmoDEST (SALMOnella Download gEnome Serotype sT) is available at https://github.com/I-Guy/SalmoDEST .

Funder

French Ministry of Agriculture, Food and Forestry

Publisher

SAGE Publications

Subject

Applied Mathematics,Computational Mathematics,Computer Science Applications,Molecular Biology,Biochemistry

Reference30 articles.

1. GenBank release notes. NCBI. https://www.ncbi.nlm.nih.gov/genbank/release/.

2. USDA. Economic research service cost of foodborne illness estimates for Salmonella (non-typhoidal). https://www.ers.usda.gov/data-products/cost-estimates-of-foodborne-illnesses.aspx. Updated 2018.

3. Blin K. ncbi-acc-download. https://github.com/kblin/ncbi-acc-download.

4. Blin K. ncbi-genome-download. https://github.com/kblin/ncbi-genome-download.

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3