Affiliation:
1. Université de Lorraine, CNRS, Inria, LORIA , F-54000 Nancy, France
Abstract
Abstract
Motivation
Protein domains can be viewed as building blocks, essential for understanding structure–function relationships in proteins. However, each domain database classifies protein domains using its own methodology. Thus, in many cases, domain models and boundaries differ from one domain database to the other, raising the question of domain definition and enumeration of true domain instances.
Results
We propose an automated iterative workflow to assess protein domain classification by cross-mapping domain structural instances between domain databases and by evaluating structural alignments. CroMaSt (for Cross-Mapper of domain Structural instances) will classify all experimental structural instances of a given domain type into four different categories (‘Core’, ‘True’, ‘Domain-like’ and ‘Failed’). CroMast is developed in Common Workflow Language and takes advantage of two well-known domain databases with wide coverage: Pfam and CATH. It uses the Kpax structural alignment tool with expert-adjusted parameters. CroMaSt was tested with the RNA Recognition Motif domain type and identifies 962 ‘True’ and 541 ‘Domain-like’ structural instances for this domain type. This method solves a crucial issue in domain-centric research and can generate essential information that could be used for synthetic biology and machine-learning approaches of protein domain engineering.
Availability and implementation
The workflow and the Results archive for the CroMaSt runs presented in this article are available from WorkflowHub (doi: 10.48546/workflowhub.workflow.390.2).
Supplementary information
Supplementary data are available at Bioinformatics Advances online.
Funder
Marie Skłodowska-Curie Innovative Training Network
Publisher
Oxford University Press (OUP)
Subject
Computer Science Applications,Genetics,Molecular Biology,Structural Biology