Affiliation:
1. Aristotle University of Thessaloniki, Greece
Abstract
The scope of this chapter is the presentation of Data Mining techniques for knowledge extraction in proteomics, taking into account both the particular features of most proteomics issues (such as data retrieval and system complexity), and the opportunities and constraints found in a Grid environment. The chapter discusses the way new and potentially useful knowledge can be extracted from proteomics data, utilizing Grid resources in a transparent way. Protein classification is introduced as a current research issue in proteomics, which also demonstrates most of the domain – specific traits. An overview of common and custom-made Data Mining algorithms is provided, with emphasis on the specific needs of protein classification problems. A unified methodology is presented for complex Data Mining processes on the Grid, highlighting the different application types and the benefits and drawbacks in each case. Finally, the methodology is validated through real-world case studies, deployed over the EGEE grid environment.
Reference51 articles.
1. A basic local alignment search tool.;S. F.Altschul;Journal of Molecular Biology,1990
2. PRINTS prepares for the new millennium
3. Superfamily classification in PIR-international protein sequence database
4. Bata, P., Alessandrini, V., Girou, D., MacLaren, J., Brooke, J., Pytlinski, J., et al. (2002). BIOGRID-A European grid for molecular biology. In Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), Edinburgh, UK.
5. The Pfam Protein Families Database