Author:
Cuff James A.,Coates Guy M.P.,Cutts Tim J.R.,Rae Mark
Abstract
Ensembl is a software project to automatically annotate large eukaryotic genomes and release them freely into the public domain. The project currently automatically annotates 10 complete genomes. This makes very large demands on compute resources, due to the vast number of sequence comparisons that need to be executed. To circumvent the financial outlay often associated with classical supercomputing environments, farms of multiple, lower-cost machines have now become the norm and have been deployed successfully with this project. The architecture and design of farms containing hundreds of compute nodes is complex and nontrivial to implement. This study will define and explain some of the essential elements to consider when designing such systems. Server architecture and network infrastructure are discussed with a particular emphasis on solutions that worked and those that did not (often with fairly spectacular consequences). The aim of the study is to give the reader, who may be implementing a large-scale biocompute project, an insight into some of the pitfalls that may be waiting ahead.
Publisher
Cold Spring Harbor Laboratory
Subject
Genetics (clinical),Genetics
Reference8 articles.
1. Fox, G., Williams, R., and Messina, P. 1994. Parallel computing works. Morgan Kaufmann, San Francisco, CA.
2. Knaff, A. 2003. Udpcast. http://udpcast.linux.lu .
3. A general method applicable to the search for similarities in the amino acid sequence of two proteins
Cited by
16 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献