Affiliation:
1. School of Computer Science and Information Technology University College Cork Cork Ireland
2. SFI Centre for Research Training in Artificial Intelligence University College Cork Cork Ireland
3. Centre for Advanced Photonics and Process Analytics Munster Technological University Cork Ireland
4. Faculty of Mathematics and Informatics Transylvania University of Brasov Brasov Romania
Abstract
AbstractIn the burgeoning field of proteins, the effective analysis of intricate protein data remains a formidable challenge, necessitating advanced computational tools for data processing, feature extraction, and interpretation. This study introduces ProteinFlow, an innovative framework designed to revolutionize feature engineering in protein data analysis. ProteinFlow stands out by offering enhanced efficiency in data collection and preprocessing, along with advanced capabilities in feature extraction, directly addressing the complexities inherent in multidimensional protein data sets. Through a comparative analysis, ProteinFlow demonstrated a significant improvement over traditional methods, notably reducing data preprocessing time and expanding the scope of biologically significant features identified. The framework's parallel data processing strategy and advanced algorithms ensure not only rapid data handling but also the extraction of comprehensive, meaningful insights from protein sequences, structures, and interactions. Furthermore, ProteinFlow exhibits remarkable scalability, adeptly managing large‐scale data sets without compromising performance, a crucial attribute in the era of big data.
Funder
Science Foundation Ireland
Reference30 articles.
1. Multiple Sequence Alignment
2. Protein Data Bank (PDB): The single global macromolecular structure archive;Burley S. K.;Protein Crystallography: Methods and Protocols,2017
3. Protein bioinformatics databases and resources;Chen C.;Protein Bioinformatics: From Protein Modifications and Networks to Proteomics,2017
4. BioFeatureFinder: Flexible, unbiased analysis of biological characteristics associated with genomic regions