Document Classification with Contextually Enriched Word Embeddings-Reference-Cited by-同舟云学术

Document Classification with Contextually Enriched Word Embeddings

Published:2024-03-01 Issue:1 Volume:12 Page:90-97
ISSN:2147-284X
Container-title:Balkan Journal of Electrical and Computer Engineering
language:
Short-container-title:

Author:

Mahmood Raad Saadi¹^ORCID,Bakal Mehmet Gökhan²^ORCID,Akbaş Ayhan³^ORCID

Affiliation:

1. CANKIRI KARATEKIN UNIVERSITY

2. ABDULLAH GUL UNIVERSITY

3. University of Surrey

Abstract

The text classification task has a wide range of application domains for distinct purposes, such as the classification of articles, social media posts, and sentiments. As a natural language processing application, machine learning and deep learning techniques are intensively utilized in solving such challenges. One common approach is employing the discriminative word features comprising Bag-of-Words and n-grams to conduct text classification experiments. The other powerful approach is exploiting neural network-based (specifically deep learning models) through either sentence, word, or character levels. In this study, we proposed a novel approach to classify documents with contextually enriched word embeddings powered by the neighbor words accessible through the trigram word series. In the experiments, a well-known web of science dataset is exploited to demonstrate the novelty of the models. Consequently, we built various models constructed with and without the proposed approach to monitor the models' performances. The experimental models showed that the proposed neighborhood-based word embedding enrichment has decent potential to use in further studies.

Funder

The authors received no financial support for the research, authorship, and/or publication of this article.

Publisher

Balkan Journal of Electrical & Computer Engineering (BAJECE)

Reference21 articles.

1. [1] A. J. Trappey, F.-C. Hsu, C. V. Trappey, and C.-I. Lin, “Development of a patent document classification and search platform using a backpropagation network,” Expert Systems with Applications, vol. 31, no. 4, pp. 755–765, 2006.

2. [2] G. Aghila et al., “A survey of nan” ive bayes machine learning approach in text document classification,” arXiv preprint arXiv:1003.1795, 2010.

3. [3] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” arXiv preprint arXiv:1607.01759, 2016.

4. [4] Q. Chen and M. Sokolova, “Specialists, scientists, and sentiments: Word2vec and doc2vec in analysis of scientific and medical texts,” SN Computer Science, vol. 2, pp. 1–11, 2021.

5. [5] G. Bakal and O. Abar, “On comparative classification of relevant covid- 19 tweets,” in 2021 6th International Conference on Computer Science and Engineering (UBMK). IEEE, 2021, pp. 287–291.