Abstract
SUMMARYAutomated domain annotation plays a number of important roles in structural informatics and typically involves searching query sequences against Hidden Markov Model (HMM) profiles. This process can be ambiguous or inaccurate when proteins contain domains with non-contiguous residue ranges, and especially when insertional domains are hosted within them. Here we present DomainMapper, an algorithm that accurately assigns a unique domain structure annotation to any query sequence, including those with complex topologies. We validate our domain assignments using the AlphaFold database and confirm that non-contiguity is pervasive (6.5% of all domains in yeast and 2.5% in human). Using this resource, we find that certain folds have strong propensities to be non-contiguous or insertional across the Tree of Life, likely underlying evolutionary preferences for domain topology. DomainMapper is freely available and can be run as a single command line function.HIGHLIGHTSDomainMapper generates a unique domain structure annotation, including non-contiguous and insertional domainsAutomated annotations of non-contiguous domains are validated against the AlphaFold databaseDomainMapper can be easily installed and used by non-expertsCertain folds have strong preferences to be non-contiguous or insertionalGRAPHICAL ABSTRACT
Publisher
Cold Spring Harbor Laboratory