Affiliation:
1. Beijing University of Posts and Telecommunication
Abstract
With the widely use of smart phone in China, all inputs and routes packets streams to the Content Distribution Service (CDS) switching centers. Each produces up to 1.5 terabytes arriving every day. Normally, the job of the switch is to transmit data. Obviously, the ordinary database cannot handle the massive dataset and complex ad-hoc query. In this paper, we propose DeepMR, a MapReduce deep service analysis system based on Hive/Hadoop frameworks. A distributed file system HDFS is used in DeepMR for fast data sharing and query. DeepMR also optimizes scheduling for switch analysis jobs and supports fault tolerance for the entire workflow. Our results show that the model achieves a higher efficiency.
Publisher
Trans Tech Publications, Ltd.
Reference8 articles.
1. P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, and A. Warfield. Xen and the art of virtualization. In Proceedings of the ACM Symposium on Operating Systems Principles, (2003).
2. A. AuYoung, L. Grit, J. Wiener, and J. Wilkes. Service contracts and aggregate utility functions. In Proceedings of the IEEE International Symposium on High Performance Distributed Computing (HPDC), June (2006).
3. R. Avnur and J. M. Hellerstein. Eddies: Continuously adaptive query processing. In ACM SIGMOD: International Conference on Management of Data, (2007).
4. R. E. Bryant. Data-intensive supercomputing: The case for DISC. Technical Report CMU-CS-07-128, Carnegie Mellon University, (2007).
5. K. Cardona, J. Secretan, M. Georgiopoulos, and G. Anagnostopoulos. A grid based system for data mining using MapReduce. Technical Report TR-2007-02, AMALTHEA, (2007).