Affiliation:
1. The University of West Florida, USA
2. University of West Florida, USA
Abstract
This study presents an efficient way to deal with discrete as well as continuous values in Big Data in a parallel Naïve Bayes implementation on Hadoop's MapReduce environment. Two approaches were taken: (i) discretizing continuous values using a binning method; and (ii) using a multinomial distribution for probability estimation of discrete values and a Gaussian distribution for probability estimation of continuous values. The models were analyzed and compared for performance with respect to run time and classification accuracy for varying data sizes, data block sizes, and map memory sizes.
Subject
Decision Sciences (miscellaneous),Information Systems
Reference16 articles.
1. Dei, W., Xue, G., Yang, Q., & Yu, Y. (2007). Transferring Naïve Bayes Classifiers for Text Classification, Association for the Advancement for Artificial Intelligence, 540-545.
2. Parallel Implementation of Classification Algorithms Based on MapReduce
3. Effective Methods for Improving Naive Bayes Text Classifiers
4. Korpipaa, P., Koskinen, M., Peltola, J., Makela, S. M., & Seppanen, T. (2003). Bayesian Approach to Sensor Based Context Awareness, Personal and Ubiquitous computing, 7(2), 113-124.
5. Liu, B., Blasch, E., Chen, Y., Shen, D., & Chen, G. (2013). Scalable Sentiment Classification for Big Data Analysis Using Naïve Bayes Classifier. In Proceedings of theIEEE International Conference on Big Data (pp. 99-104). IEEE Press.
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献