Affiliation:
1. Xerox Research Center Webster, USA
2. Xerox Research Center India, India
Abstract
In the modern information era, the amount of data has exploded. Current trends further indicate exponential growth of data in the future. This prevalent humungous amount of data—referred to as big data—has given rise to the problem of finding the “needle in the haystack” (i.e., extracting meaningful information from big data). Many researchers and practitioners are focusing on big data analytics to address the problem. One of the major issues in this regard is the computation requirement of big data analytics. In recent years, the proliferation of many loosely coupled distributed computing infrastructures (e.g., modern public, private, and hybrid clouds, high performance computing clusters, and grids) have enabled high computing capability to be offered for large-scale computation. This has allowed the execution of the big data analytics to gather pace in recent years across organizations and enterprises. However, even with the high computing capability, it is a big challenge to efficiently extract valuable information from vast astronomical data. Hence, we require unforeseen scalability of performance to deal with the execution of big data analytics. A big question in this regard is how to maximally leverage the high computing capabilities from the aforementioned loosely coupled distributed infrastructure to ensure fast and accurate execution of big data analytics. In this regard, this chapter focuses on synchronous parallelization of big data analytics over a distributed system environment to optimize performance.
Reference42 articles.
1. Alves, D., Bizarro, P., & Marques, P. (2011). Deadline queries: Leveraging the cloud to produce on-time results. In Proceedings of International Conference on Cloud Computing (pp. 171–178). IEEE.
2. Andreolini, M., Casolari, S., & Colajanni, M. (2008). Autonomic request management algorithms for geographically distributed internet-based systems. In Proceedings of International Conference on Self-Adaptive and Self-Organizing Systems (pp. 171–180). IEEE.
3. Brunner, R. K., & Kale, L. V. (1999) Adapting to load on workstation clusters. In Proceedings of the Symposium on the Frontiers of Massively Parallel Computation (pp. 106–112). IEEE.
4. Bryant, R. E. (2007). Data-intensive supercomputing: The case for DISC[R] (CMU Technical Report CMU-CS-07-128). Pittsburgh, PA: Department of Computer Science, Carnegie Mellon University.