A New Method for Graph-Based Representation of Text in Natural Language Processing
-
Published:2023-06-27
Issue:13
Volume:12
Page:2846
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Probierz Barbara12ORCID, Hrabia Anita1ORCID, Kozak Jan12ORCID
Affiliation:
1. Department of Machine Learning, University of Economics in Katowice, 1 Maja 50, 40-287 Katowice, Poland 2. Łukasiewicz Research Network—Institute of Innovative Technologies EMAG, Leopolda 31, 40-189 Katowice, Poland
Abstract
Natural language processing is still an emerging field in machine learning. Access to more and more data sets in textual form, new applications for artificial intelligence and the need for simple communication with operating systems all simultaneously affect the importance of natural language processing in evolving artificial intelligence. Traditional methods of textual representation, such as Bag-of-Words, have some limitations that result from the lack of consideration of semantics and dependencies between words. Therefore, we propose a new approach based on graph representations, which takes into account both local context and global relationships between words, allowing for a more expressive textual representation. The aim of the paper is to examine the possibility of using graph representations in natural language processing and to demonstrate their use in text classification. An innovative element of the proposed approach is the use of common cliques in graphs representing documents to create a feature vector. Experiments confirm that the proposed approach can improve classification efficiency. The use of a new text representation method to predict book categories based on the analysis of its content resulted in accuracy, precision, recall and an F1-score of over 90%. Moving from traditional approaches to a graph-based approach could make a big difference in natural language processing and text analysis and could open up new opportunities in the field.
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference67 articles.
1. Bales, M.E., Wright, D.N., Oxley, P.R., and Wheeler, T.R. (2020). Bibliometric Visualization and Analysis Software: State of the Art, Workflows, and Best Practices, Cornell University. 2. Jaradeh, M.Y., Oelen, A., Farfar, K.E., Prinz, M., D’Souza, J., Kismihók, G., Stocker, M., and Auer, S. (2019, January 19–21). Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. Proceedings of the 10th International Conference on Knowledge Capture, Marina Del Rey, CA, USA. 3. Topic and sentiment aware microblog summarization for twitter;Ali;J. Intell. Inf. Syst.,2020 4. Wanigasooriya, A., and Silva, W.P.D. (2021). Automated Text Classification of Library Books into the Dewey Decimal Classification (DDC), University of Kelaniya. 5. Advances in natural language processing;Hirschberg;Science,2015
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|