CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields-Reference-Cited by-同舟云学术

CRFalign: A Sequence-Structure Alignment of Proteins Based on a Combination of HMM-HMM Comparison and Conditional Random Fields

Published:2022-06-09 Issue:12 Volume:27 Page:3711
ISSN:1420-3049
Container-title:Molecules
language:en
Short-container-title:Molecules

Author:

Lee Sung Jong,Joo Keehyoung,Sim Sangjin,Lee Juyong^ORCID,Lee In-Ho,Lee Jooyoung

Abstract

Sequence–structure alignment for protein sequences is an important task for the template-based modeling of 3D structures of proteins. Building a reliable sequence–structure alignment is a challenging problem, especially for remote homologue target proteins. We built a method of sequence–structure alignment called CRFalign, which improves upon a base alignment model based on HMM-HMM comparison by employing pairwise conditional random fields in combination with nonlinear scoring functions of structural and sequence features. Nonlinear scoring part is implemented by a set of gradient boosted regression trees. In addition to sequence profile features, various position-dependent structural features are employed including secondary structures and solvent accessibilities. Training is performed on reference alignments at superfamily levels or twilight zone chosen from the SABmark benchmark set. We found that CRFalign method produces relative improvement in terms of average alignment accuracies for validation sets of SABmark benchmark. We also tested CRFalign on 51 sequence–structure pairs involving 15 FM target domains of CASP14, where we could see that CRFalign leads to an improvement in average modeling accuracies in these hard targets (TM-CRFalign ≃42.94%) compared with that of HHalign (TM-HHalign ≃39.05%) and also that of MRFalign (TM-MRFalign ≃36.93%). CRFalign was incorporated to our template search framework called CRFpred and was tested for a random target set of 300 target proteins consisting of Easy, Medium and Hard sets which showed a reasonable template search performance.

Funder

Ministry of Science and ICT, KOREA

Publisher

MDPI AG

Subject

Chemistry (miscellaneous),Analytical Chemistry,Organic Chemistry,Physical and Theoretical Chemistry,Molecular Medicine,Drug Discovery,Pharmaceutical Science

Link

https://www.mdpi.com/1420-3049/27/12/3711/pdf

Reference39 articles.

1. Improved protein structure prediction using potentials from deep learning

2. Highly accurate protein structure prediction with AlphaFold

3. A machine learning information retrieval approach to protein fold recognition

4. A multi-template combination algorithm for protein comparative modeling

5. Progress and challenges in protein structure prediction

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DeepFold: enhancing protein structure prediction through optimized loss functions, improved template features, and re-optimized energy function;Bioinformatics;2023-11-23

2. Correlation Modeling of Moral Emotion based on Facial Image Emotion Recognition Algorithm;2022 International Conference on Augmented Intelligence and Sustainable Systems (ICAISS);2022-11-24