Affiliation:
1. Department of CSE & IT, Jaypee University of Information Technology, Solan, H.P., India
2. Department of ECE, Jaypee University of Information Technology, Solan, H.P., India
Abstract
Aims:Text classification emerged as an important approach to advancing Natural Language Processing (NLP) applications concerning the available text on the web. To analyze the text, many applications are proposed in the literature.Background:The NLP, with the help of deep learning, has achieved great success in automatically sorting text data in predefined classes, but this process is expensive and time-consuming.Objectives:To overcome this problem, in this paper, various Machine Learning techniques are studied & implemented to generate an automated system for movie review classification.Methodology:The proposed methodology uses the Bidirectional Encoder Representations of the Transformer (BERT) model for data preparation and predictions using various machine learning algorithms like XG boost, support vector machine, logistic regression, naïve Bayes, and neural network. The algorithms are analyzed based on various performance metrics like accuracy, precision, recall and F1 score.Result:The results reveal that the 2-hidden layer neural network outperforms the other models by achieving more than 0.90 F1 score in the first 15 epochs and 0.99 in just 40 epochs on the IMDB dataset, thus reducing the time to a great extent.Conclusion:100% accuracy is attained using a neural network, resulting in a 15% accuracy improvement and 14.6% F1 score improvement over logistic regression.
Publisher
Bentham Science Publishers Ltd.
Reference38 articles.
1. Rana S.; Kanji R.; Jain S.; 5th International Conference on Multimedia, Signal Processing and Communication Technologies (IMPACT) Aligarh, India2022,1-5
2. Prashar N.; Sood M.; Jain S.; A novel cardiac arrhythmia processing using machine learning techniques. Int J Image Graph 2020,20(3),2050023
3. Kirti H.; Sohal, S Jain, “Multistage classification of arrhythmia and atrial fibrillation on long-term heart rate variability”, J. Engineer. Sci Technol 2020,15(2),1277-1295
4. Aggarwal C.C.; Zhai C.X.; A Survey of text classification algorithms Mining text data 2012,163-222
5. Mikolov T.; Sutskever I.; Chen K.; Corrado G.S.; Dean J.; Distributed representations of words and phrases and their compositionality. NIPS 2013,3111-3119