Classification of texts on emergency situations in Almaty
-
Published:2023-04-30
Issue:4
Volume:327
Page:23-31
-
ISSN:2224-5243
-
Container-title:Kompleksnoe Ispolʹzovanie Mineralʹnogo syrʹâ/Complex Use of Mineral Resources/Mineraldik Shikisattardy Keshendi Paidalanu
-
language:
-
Short-container-title:KIMS/CUMR/MShKP
Author:
Andirov M.Y., ,Assan Zh.Zh.,Nopembri S.,Seilkhan A.M.,Myrzakhmetov D.E., , , ,
Abstract
Text classification is a process that includes stages and approaches for the effective classification of texts that are diverse in their structure. In this article, machine learning algorithms are implemented, such as the support vector method, logistic regression, and the k nearest neighborhood method for classifying texts collected from emergency news sites in Almaty. During the experiment, a special role was played by the data collection stage, as well as their subsequent processing. Prior to the classification of the data set, preliminary data processing was performed, which includes such steps as the removal of stop words, tokenization, stemming, lemmatization, feature extraction, and the construction of feature vectors. The data was obtained by automated collection of information from open sources using a script. Experimental results show that the classifier based on logistic regression provides the best performance results compared to other types of algorithms. The performance indicators of each algorithm were obtained, which allows us to perform a comparative analysis between them.
Publisher
Institute of Metallurgy and Ore Benefication (Publications)
Subject
Metals and Alloys,Mechanical Engineering,Mechanics of Materials