Abstract
AbstractThe paper reviews the use of the Hadoop platform in Structural Bioinformatics applications. Specifically, we review a number of implementations using Hadoop of high-throughput analyses, e.g. ligand-protein docking and structural alignment, and their scalability in comparison with other batch schedulers and MPI. We find that these deployments for the most part use known executables called from MapReduce rather than rewriting the algorithms. The scalability exhibits a variable behaviour in comparison with other batch schedulers, particularly as direct comparisons on the same platform are generally not available. We do note there is some evidence that MPI implementations scale better than Hadoop. A significant barrier to the use of the Hadoop ecosystem is the difficulty of the interface and configuration of a resource to use Hadoop. This will improve over time as interfaces to Hadoop e.g. Spark improve, usage of cloud platforms (e.g. Azure and AWS) increases and approaches such as the Workflow Definition Language are taken up.
Publisher
Cold Spring Harbor Laboratory
Reference83 articles.
1. [29] Protein data bank archives of three-dimensional macromolecular structures
2. The Cambridge Structural Database: a quarter of a million crystal structures and rising
3. Amazon. Amazon EMR (Elastic MapReduce). https://aws.amazon.com/emr/, 2016. [Online; accessed 14-April-2017].
4. Apache Software Foundation. HDFS architecture documentation. http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html, 2016. [Online; accessed 10-Jan-2017].
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献