Affiliation:
1. Faculty of Computer Science, Universitas Mercu Buana, Jakarta Barat, Indonesia
Abstract
In organization, statement contained opinion and complaint to a service or program by it organization. can be proceed using machine learning and the result can be used by organization to improve and enhance their quality. This research attempted to classify the reports from social media based on complaint and non-complaint using machine learning algorithm named Logistic regression (LR) and eXtreme Gradient Boosting (XGBoost). Logistic Regression model using CountVectorizer feature extraction and TfidfVectorizer. Moreover, the XGBoost algorithm uses multiple parameters so that it can be improved by tuning the parameters, i.e. eta or learning rate, gamma, max_depth, min_child_weight, subsample, colsample_bytree and alpha. As the result, the best value for XGBoost with parameter are 'reg_alpha': 0.01, 'colsample_bytree': 0.9, 'learning_rate': 0.5, 'min_child_weight': 1, 'subsample': 0.8, 'max_depth': 3, 'gamma': 0.0, in wich the computational time is 13870.012468 and the best accuracy that achieved is 0.927943760984. Furthermore, the performance evaluation results for Logistic Regression using TfidfVectorizer and CountVectorizer feature extraction are 0.9181 and 0.9356.
Reference20 articles.
1. I. Nurhaida, A. Noviyanto, M. Manurung, and A. M. Arymurthi, “Automatic Indonesian’s Batik Pattern Recognition using SIFT Approach,” in ICCSCI - 1st International Conference on Computer Science and Computational Intelligence, Jakarta, 2015.
2. H. Noprisson, E. Hidayat, and N. Zulkarnaim, “A Preliminary Study of Modelling Interconnected Systems Initiatives for Preserving Indigenous Knowledge in Indonesia,” in 2015 International Conference on Information Technology Systems and Innovation (ICITSI), 2015, pp. 1-6.
3. W. P. Sari, E. Cahyaningsih, D. I. Sensuse, and H. Noprisson, “The welfare classification of Indonesian national civil servant using TOPSIS and k-Nearest Neighbour (KNN),” in Research and Development (SCOReD), 2016 IEEE Student Conference on, 2016, pp. 1-5.
4. D. Fitrianah, A. N. Hidayanto, R. A. Zen, and A. M. Arymurthy, “APDATI: E-Fishing Logbook for Integrated Tuna Fishing Data Management,” J. Theor. Appl. Inf. Technol., vol. 75, no. 2, 2015.
5. M. Sadikin, M. I. Fanany, and T. Basaruddin, “A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text,” Comput. Intell. Neurosci., vol. 2016, 2016.