Abstract
AbstractDue to shifts in environmental conditions, mutations, or interactions with other biomolecules, some proteins that would normally be soluble can undergo aggregation, resulting in the formation of clumps of amyloid fibrils. Understanding of this phenomenon is of paramount importance due not only to its association with various diseases (including Alzheimer’s disease), but also due to increasingly abundant evidence for its functional roles. Numerous studies have demonstrated that the propensity to form amyloids is coded by the amino acid sequence and this finding has paved the way for the development of several computational predictors of amyloidogenicity. The ultimate objective of computational methods is to accurately predict the formation of disease-related and functionally relevant amyloids that occurin vivo. These amyloid fibrils are known to form very specific “cross-β” structures of protein regions longer than about 15 residues. Remarkably, despite the significance of the naturally occurring amyloids, there has been a lack of datasets specifically dedicated to them. Hence, we built Cross-Beta DB, a database composed of cross-β amyloids formed in natural conditions. This database is expected to be indispensable for benchmarking amyloid predictors. We used the Cross-Beta DB to train and benchmark several such algorithms, using machine learning. The best-performing of these, the random-forest-based Cross-Beta RF Predictor, demonstrated superior performance over the other existing methods, fostering high expectations for an improved prediction of naturally occurring amyloids.
Publisher
Cold Spring Harbor Laboratory
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献