CroMaSt: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment-Reference-Cited by-同舟云学术

CroMaSt: a workflow for assessing protein domain classification by cross-mapping of structural instances between domain databases and structural alignment

Published:2023-01-01 Issue:1 Volume:3 Page:
ISSN:2635-0041
Container-title:Bioinformatics Advances
language:en
Short-container-title:

Author:

Dhondge Hrishikesh¹^ORCID,Chauvot de Beauchêne Isaure¹^ORCID,Devignes Marie-Dominique¹^ORCID

Affiliation:

1. Université de Lorraine, CNRS, Inria, LORIA , F-54000 Nancy, France

Abstract

Abstract Motivation Protein domains can be viewed as building blocks, essential for understanding structure–function relationships in proteins. However, each domain database classifies protein domains using its own methodology. Thus, in many cases, domain models and boundaries differ from one domain database to the other, raising the question of domain definition and enumeration of true domain instances. Results We propose an automated iterative workflow to assess protein domain classification by cross-mapping domain structural instances between domain databases and by evaluating structural alignments. CroMaSt (for Cross-Mapper of domain Structural instances) will classify all experimental structural instances of a given domain type into four different categories (‘Core’, ‘True’, ‘Domain-like’ and ‘Failed’). CroMast is developed in Common Workflow Language and takes advantage of two well-known domain databases with wide coverage: Pfam and CATH. It uses the Kpax structural alignment tool with expert-adjusted parameters. CroMaSt was tested with the RNA Recognition Motif domain type and identifies 962 ‘True’ and 541 ‘Domain-like’ structural instances for this domain type. This method solves a crucial issue in domain-centric research and can generate essential information that could be used for synthetic biology and machine-learning approaches of protein domain engineering. Availability and implementation The workflow and the Results archive for the CroMaSt runs presented in this article are available from WorkflowHub (doi: 10.48546/workflowhub.workflow.390.2). Supplementary information Supplementary data are available at Bioinformatics Advances online.

Funder

Marie Skłodowska-Curie Innovative Training Network

Publisher

Oxford University Press (OUP)

Subject

Computer Science Applications,Genetics,Molecular Biology,Structural Biology

Link

https://academic.oup.com/bioinformaticsadvances/advance-article-pdf/doi/10.1093/bioadv/vbad081/50725348/vbad081.pdf

Reference21 articles.

1. The SCOP database in 2020: expanded classification of representative family and superfamily domains of known protein structures;Andreeva;Nucleic Acids Res,2020

2. The Pfam protein families database;Bateman;Nucleic Acids Res,2002

3. RCSB Protein Data Bank: improved annotation, search and visualization of membrane protein structures archived in the PDB;Bittrich;Bioinformatics,2022

4. ESRP1 induces cervical cancer cell G1-phase arrest via regulating cyclin A2 mRNA stability;Chen;IJMS,2019

5. ECOD: an evolutionary classification of protein domains;Cheng;PLoS Comput. Biol,2014