Disentangling the complexity of low complexity proteins

Author:

Mier Pablo1ORCID,Paladin Lisanna2,Tamana Stella3,Petrosian Sophia4,Hajdu-Soltész Borbála5,Urbanek Annika6,Gruca Aleksandra7,Plewczynski Dariusz89,Grynberg Marcin10,Bernadó Pau6,Gáspári Zoltán11,Ouzounis Christos A4,Promponas Vasilis J3,Kajava Andrey V1213,Hancock John M1415,Tosatto Silvio C E216ORCID,Dosztanyi Zsuzsanna5,Andrade-Navarro Miguel A1ORCID

Affiliation:

1. Institute of Organismic and Molecular Evolution, Johannes Gutenberg University of Mainz, Mainz, Germany

2. Department of Biomedical Science, University of Padova, Padova, Italy

3. Bioinformatics Research Laboratory, Department of Biological Sciences, University of Cyprus, Nicosia, Cyprus

4. Biological Computation and Process Laboratory, Chemical Process & Energy Resources Institute, Centre for Research & Technology Hellas, Thessalonica, Greece

5. MTA-ELTE Lendület Bioinformatics Research Group, Department of Biochemistry, Eötvös Loránd University, Budapest, Hungary

6. Centre de Biochimie Structurale, INSERM, CNRS, Université de Montpellier, Montpellier, France

7. Institute of Informatics, Silesian University of Technology, Gliwice, Poland

8. Center of New Technologies, University of Warsaw, Warsaw, Poland

9. Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland

10. Institute of Biochemistry and Biophysics, Warsaw, Poland

11. Faculty of Information Technology and Bionics, Pázmány Péter Catholic University, Budapest, Hungary

12. Centre de Recherche en Biologie Cellulaire de Montpellier, CNRS-UMR, Institut de Biologie Computationnelle, Universite de Montpellier, Montpellier, France

13. Institute of Bioengineering, University ITMO, St. Petersburg, Russia

14. Earlham Institute, Norwich, UK

15. ELIXIR Hub, Welcome Genome Campus, Hinxton, UK

16. CNR Institute of Neuroscience, Padova, Italy

Abstract

Abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. Short abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.

Funder

Institute of Informatics

National Research Development and Innovation Office

Hungarian Academy of Sciences

European Research Council

European Union

COST Association

János Bolyai Research Scholar

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Cited by 72 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3