Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins-Reference-Cited by-同舟云学术

Optimizing strategy for the discovery of compositionally-biased or low-complexity regions in proteins

Published:2024-01-05 Issue:1 Volume:14 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Harrison Paul M.

Abstract

AbstractProteins can contain tracts dominated by a subset of amino acids and that have a functional significance. These are often termed ‘low-complexity regions’ (LCRs) or ‘compositionally-biased regions’ (CBRs). However, a wide spectrum of compositional bias is possible, and program parameters used to annotate these regions are often arbitrarily chosen. Also, investigators are sometimes interested in longer regions, or sometimes very short ones. Here, two programs for annotating LCRs/CBRs, namely SEG and fLPS, are investigated in detail across the whole expanse of their parameter spaces. In doing so, boundary behaviours are resolved that are used to derive an optimized systematic strategy for annotating LCRs/CBRs. Sets of parameters that progressively annotate or ‘cover’ more of protein sequence space and are optimized for a given target length have been derived. This progressive annotation can be applied to discern the biological relevance of CBRs, e.g., in parsing domains for experimental constructs and in generating hypotheses. It is also useful for picking out candidate regions of interest of a given target length and bias signature, and for assessing the parameter dependence of annotations. This latter application is demonstrated for a set of human intrinsically-disordered proteins associated with cancer.

Funder

Natural Sciences and Engineering Research Council of Canada

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41598-023-50991-8.pdf

Reference34 articles.

1. Harrison, P. M. Exhaustive assignment of compositional bias reveals universally prevalent biased regions: Analysis of functional associations in human and Drosophila. BMC Bioinform. 7, 441. https://doi.org/10.1186/1471-2105-7-441 (2006).

2. Harrison, P. M. Compositionally biased dark matter in the protein universe. Proteomics 18, e1800069. https://doi.org/10.1002/pmic.201800069 (2018).

3. Hancock, J. M. & Armstrong, J. S. SIMPLE34: An improved and enhanced implementation for VAX and Sun computers of the SIMPLE algorithm for analysis of clustered repetitive motifs in nucleotide sequences. Comput. Appl. Biosci. 10, 67–70. https://doi.org/10.1093/bioinformatics/10.1.67 (1994).

4. Alba, M. M., Laskowski, R. A. & Hancock, J. M. Detecting cryptically simple protein sequences using the SIMPLE algorithm. Bioinformatics 18, 672–678. https://doi.org/10.1093/bioinformatics/18.5.672 (2002).

5. Wootton, J. C. & Federhen, S. Analysis of compositionally biased regions in sequence databases. Methods Enzymol. 266, 554–571 (1996).