Affiliation:
1. Department of Mathematics, State University of New York Cortland, Cortland, NY 13045, USA
2. Department of Computer Science, Central Michigan University, Mt Pleasant, MI 48859, USA
3. Department of Physics and Computer Science, Xavier University of Louisiana, New Orleans, LA 70125, USA
Abstract
Predicting stock market movement direction is a challenging task due to its fuzzy, chaotic, volatile, nonlinear, and complex nature. However, with advancements in artificial intelligence, abundant data availability, and improved computational capabilities, creating robust models capable of accurately predicting stock market movement is now feasible. This study aims to construct a predictive model using news headlines to predict stock market movement direction. It conducts a comparative analysis of five supervised classification machine learning algorithms—logistic regression (LR), support vector machine (SVM), random forest (RF), extreme gradient boosting (XGBoost), and artificial neural network (ANN)—to predict the next day’s movement direction of the close price of the Nepal Stock Exchange (NEPSE) index. Sentiment scores from news headlines are computed using the Valence Aware Dictionary for Sentiment Reasoning (VADER) and TextBlob sentiment analyzer. The models’ performance is evaluated based on sensitivity, specificity, accuracy, and the area under the receiver operating characteristic (ROC) curve (AUC). Experimental results reveal that all five models perform equally well when using sentiment scores from the TextBlob analyzer. Similarly, all models exhibit almost identical performance when using sentiment scores from the VADER analyzer, except for minor variations in AUC in SVM vs. LR and SVM vs. ANN. Moreover, models perform relatively better when using sentiment scores from the TextBlob analyzer compared to the VADER analyzer. These findings are further validated through statistical tests.
Reference134 articles.
1. Sentiment analysis of covid-19 tweets from selected hashtags in nigeria using vader and text blob analyser;Abiola;Journal of Electrical Systems and Information Technology,2023
2. Ordinal logistic regression in epidemiological studies;Abreu;Revista de Saude Publica,2009
3. Simple and effective confidence intervals for proportions and differences of proportions result from adding two successes and two failures;Agresti;The American Statistician,2000
4. Ahangar, Reza Gharoie, Yahyazadehfar, Mahmood, and Pournaghshband, Hassan (2010). The comparison of methods artificial neural network with linear regression using specific variables for prediction stock price in tehran stock exchange. arXiv.
5. Trees vs neurons: Comparison between random forest and ann for high-resolution prediction of building energy consumption;Ahmad;Energy and Buildings,2017