SHEPHARD: a modular and extensible software architecture for analyzing and annotating large protein datasets-Reference-Cited by-同舟云学术

SHEPHARD: a modular and extensible software architecture for analyzing and annotating large protein datasets

Published:2023-08-01 Issue:8 Volume:39 Page:
ISSN:1367-4811
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Ginell Garrett M¹²,Flynn Aidan J¹²,Holehouse Alex S¹²^ORCID

Affiliation:

1. Department of Biochemistry and Molecular Biophysics, Washington University School of Medicine , 660 South Euclid Avenue , Saint Louis, MO 63110, United States

2. Center for Biomolecular Condensates, Washington University in St. Louis , 1 Brookings Drive , Saint Louis, MO 63130, United States

Abstract

Abstract Motivation The emergence of high-throughput experiments and high-resolution computational predictions has led to an explosion in the quality and volume of protein sequence annotations at proteomic scales. Unfortunately, sanity checking, integrating, and analyzing complex sequence annotations remains logistically challenging and introduces a major barrier to entry for even superficial integrative bioinformatics. Results To address this technical burden, we have developed SHEPHARD, a Python framework that trivializes large-scale integrative protein bioinformatics. SHEPHARD combines an object-oriented hierarchical data structure with database-like features, enabling programmatic annotation, integration, and analysis of complex datatypes. Importantly SHEPHARD is easy to use and enables a Pythonic interrogation of largescale protein datasets with millions of unique annotations. We use SHEPHARD to examine three orthogonal proteome-wide questions relating protein sequence to molecular function, illustrating its ability to uncover novel biology. Availability and implementation We provided SHEPHARD as both a stand-alone software package (https://github.com/holehouse-lab/shephard), and as a Google Colab notebook with a collection of precomputed proteome-wide annotations (https://github.com/holehouse-lab/shephard-colab).

Funder

Dewpoint Therapeutics, National Science Foundation

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

https://academic.oup.com/bioinformatics/advance-article-pdf/doi/10.1093/bioinformatics/btad488/51038856/btad488.pdf

Reference19 articles.

1. The structural context of posttranslational modifications at a proteome-wide scale;Bludau;PLoS Biol,2022

2. Spontaneous driving forces give rise to protein–RNA condensates with coexisting phases and complex material properties;Boeynaems;Proc Natl Acad Sci USA,2019

3. A concentration-dependent liquid phase separation can cause toxicity upon increased protein expression;Bolognesi;Cell Rep,2016

4. Relating sequence encoded information to form and function of intrinsically disordered proteins;Das;Current Opinion in Structural Biology,2015

5. SWI/SNF senses carbon starvation with a pH-sensitive low-complexity sequence;Gutierrez;Elife,2022

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Protein surface chemistry encodes an adaptive resistance to desiccation;2024-07-29

2. Phosphorylation of disordered proteins tunes local and global intramolecular interactions;2024-06-12

3. Direct prediction of intermolecular interactions driven by disordered regions;2024-06-03

4. Direct prediction of intrinsically disordered protein conformational properties from sequence;Nature Methods;2024-01-31

5. The molecular basis for cellular function of intrinsically disordered protein regions;Nature Reviews Molecular Cell Biology;2023-11-13