Author:
Afgan Enis,Baker Dannon,Coraor Nate,Chapman Brad,Nekrutenko Anton,Taylor James
Abstract
Abstract
Background
Widespread adoption of high-throughput sequencing has greatly increased the scale and sophistication of computational infrastructure needed to perform genomic research. An alternative to building and maintaining local infrastructure is “cloud computing”, which, in principle, offers on demand access to flexible computational infrastructure. However, cloud computing resources are not yet suitable for immediate “as is” use by experimental biologists.
Results
We present a cloud resource management system that makes it possible for individual researchers to compose and control an arbitrarily sized compute cluster on Amazon’s EC2 cloud infrastructure without any informatics requirements. Within this system, an entire suite of biological tools packaged by the NERC Bio-Linux team (http://nebc.nerc.ac.uk/tools/bio-linux) is available for immediate consumption. The provided solution makes it possible, using only a web browser, to create a completely configured compute cluster ready to perform analysis in less than five minutes. Moreover, we provide an automated method for building custom deployments of cloud resources. This approach promotes reproducibility of results and, if desired, allows individuals and labs to add or customize an otherwise available cloud system to better meet their needs.
Conclusions
The expected knowledge and associated effort with deploying a compute cluster in the Amazon EC2 cloud is not trivial. The solution presented in this paper eliminates these barriers, making it possible for researchers to deploy exactly the amount of computing power they need, combined with a wealth of existing analysis software, to handle the ongoing data deluge.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference17 articles.
1. Armbrust M, Fox A, Griffith R, Joseph AD, Katz R, Konwinski A, Lee G, Patterson D, Rabkin A, Stoica I, Zaharia M: Above the Clouds: A Berkeley View of Cloud Computing. In Book Above the Clouds: A Berkeley View of Cloud Computing. Edited by: Editor ed.^eds.. University of California at Berkeley; 2009:23.
2. Langmead B, Schatz MC, Lin J, Pop M, Salzberg SL: Searching for SNPs with cloud computing. Genome Biol 2009, 10: R134. 10.1186/gb-2009-10-11-r134
3. Schatz MC: CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 2009, 25: 1363–1369. 10.1093/bioinformatics/btp236
4. Wall DP, Kudtarkar P, Fusaro VA, Pivovarov R, Patil P, Tonellato PJ: Cloud computing for comparative genomics. BMC Bioinformatics 2010, 11: 259.
5. Schatz MC, Langmead B, Salzberg SL: Cloud computing and the DNA data race. Nat Biotechnol 2010, 28: 691–693. 10.1038/nbt0710-691
Cited by
122 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献