Affiliation:
1. Eskişehir Teknik Üniversitesi
Abstract
News categorization, which is a common application area of text classification, is the task of automatic annotation of news articles with predefined categories. In parallel with the rise of deep learning techniques in the field of machine learning, neural embedding models have been widely utilized to capture hidden relationships and similarities among textual representations of news articles. In this study, we approach the Turkish news categorization problem as an ad-hoc retrieval task and investigate the effectiveness of paragraph vector models to compute and utilize document-wise similarities of Turkish news articles. We propose an ensemble categorization approach that consists of three main stages, namely, document processing, paragraph vector learning, and document similarity estimation. Extensive experiments conducted on the TTC-3600 dataset reveal that the proposed system can reach up to 93.5% classification accuracy, which is a remarkable performance when compared to the baseline and state-of-the-art methods. Moreover, it is also shown that the Distributed Bag of Words version of Paragraph Vectors performs better than the Distributed Memory Model of Paragraph Vectors in terms of both accuracy and computational performance.
Publisher
Anadolu Universitesi Bilim ve Teknoloji Dergisi-A: Uygulamali Bilimler ve Muhendislik
Reference27 articles.
1. [1] Kowsari K, Jafari Meimandi K, Heidarysafa M, Mendu S, Barnes L, Brown D. Text classification algorithms: A survey. Information ,2019; 10(4): 150.
2. [2] Uysal AK, Gunal S. The impact of preprocessing on text classification. Information Processing & Management, 2014; 50(1): 104-112.
3. [3] Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J. Deep learning-based text classification: a comprehensive review. ACM Computing Surveys, 2021; 54(3): 1-40.
4. [4] Skogerbø E, Winsvold M. Audiences on the move? Use and assessment of local print and online newspapers. European Journal of Communication, 2011; 26(3): 214-229.
5. [5] Le Q, Mikolov T. Distributed representations of sentences and documents. In: 31st International Conference on Machine Learning (ICML 2014); Beijing; China; 2014; pp. 1188-1196.