Abstract
AbstractThis study presents a comprehensive approach to detect circular permutations in Protein Data Bank up to date (PDB, 287081 proteins which sequence length is under 800 up to 20240101). We systematically analyzed the Protein Data Bank (PDB) to identify circular permutations, leveraging FoldSeek and MMseqs2 for structural and sequence similarity searches. The 143756535 candidate pairs were filtered by some threshold for corresponding analysis. TM-align, icarus or plmCP was used to align protein structures and refine detection accuracy, while facilitated the precise identification of circular permutations. Finally, we got 20801 candidate circular permutation pairs and 3351 circular permutation proteins(https://github.com/YueHuLab/Circular-permutation-in-PDB). Our methodology provides a robust framework for uncovering circular permutations in protein databases, enhancing our understanding of protein structural variations and evolutionary adaptations.
Publisher
Cold Spring Harbor Laboratory