High Performance Integration Pipeline for Viral and Epitope Sequences-Reference-Cited by-同舟云学术

High Performance Integration Pipeline for Viral and Epitope Sequences

Published:2022-03-21 Issue:1 Volume:11 Page:7
ISSN:2673-6284
Container-title:BioTech
language:en
Short-container-title:BioTech

Author:

Alfonsi Tommaso^ORCID,Pinoli Pietro^ORCID,Canakoglu Arif^ORCID

Abstract

With the spread of COVID-19, sequencing laboratories started to share hundreds of sequences daily. However, the lack of a commonly agreed standard across deposition databases hindered the exploration and study of all the viral sequences collected worldwide in a practical and homogeneous way. During the first months of the pandemic, we developed an automatic procedure to collect, transform, and integrate viral sequences of SARS-CoV-2, MERS, SARS-CoV, Ebola, and Dengue from four major database institutions (NCBI, COG-UK, GISAID, and NMDC). This data pipeline allowed the creation of the data exploration interfaces VirusViz and EpiSurf, as well as ViruSurf, one of the largest databases of integrated viral sequences. Almost two years after the first release of the repository, the original pipeline underwent a thorough refinement process and became more efficient, scalable, and general (currently, it also includes epitopes from the IEDB). Thanks to these improvements, we constantly update and expand our integrated repository, encompassing about 9.1 million SARS-CoV-2 sequences at present (March 2022). This pipeline made it possible to design and develop fundamental resources for any researcher interested in understanding the biological mechanisms behind the viral infection. In addition, it plays a crucial role in many analytic and visualization tools, such as ViruSurf, EpiSurf, VirusViz, and VirusLab.

Funder

European Research Council

European Institute of Innovation and Technology

Publisher

MDPI AG

Subject

Applied Microbiology and Biotechnology,Biomedical Engineering,Biochemistry,Bioengineering,Biotechnology

Link

https://www.mdpi.com/2673-6284/11/1/7/pdf

Reference21 articles.

1. ViruSurf: an integrated database to investigate viral sequences

2. OUP accepted manuscript

3. VirusLab: A Tool for Customized SARS-CoV-2 Data Analysis

4. 2019nCoVR—A comprehensive genomic resource for SARS-CoV-2 variant surveillance

5. CoV-Seq: SARS-CoV-2 Genome Analysis and Visualization

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Conceptual Modeling for Bioinformatics;Reference Module in Life Sciences;2024

2. Bioinformatics and High-Performance Computing Methods for Deciphering and Fighting COVID-19—Editorial;BioTech;2022-10-15