A New Method for Graph-Based Representation of Text in Natural Language Processing

Author:

Probierz Barbara12ORCID,Hrabia Anita1ORCID,Kozak Jan12ORCID

Affiliation:

1. Department of Machine Learning, University of Economics in Katowice, 1 Maja 50, 40-287 Katowice, Poland

2. Łukasiewicz Research Network—Institute of Innovative Technologies EMAG, Leopolda 31, 40-189 Katowice, Poland

Abstract

Natural language processing is still an emerging field in machine learning. Access to more and more data sets in textual form, new applications for artificial intelligence and the need for simple communication with operating systems all simultaneously affect the importance of natural language processing in evolving artificial intelligence. Traditional methods of textual representation, such as Bag-of-Words, have some limitations that result from the lack of consideration of semantics and dependencies between words. Therefore, we propose a new approach based on graph representations, which takes into account both local context and global relationships between words, allowing for a more expressive textual representation. The aim of the paper is to examine the possibility of using graph representations in natural language processing and to demonstrate their use in text classification. An innovative element of the proposed approach is the use of common cliques in graphs representing documents to create a feature vector. Experiments confirm that the proposed approach can improve classification efficiency. The use of a new text representation method to predict book categories based on the analysis of its content resulted in accuracy, precision, recall and an F1-score of over 90%. Moving from traditional approaches to a graph-based approach could make a big difference in natural language processing and text analysis and could open up new opportunities in the field.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Reference67 articles.

1. Bales, M.E., Wright, D.N., Oxley, P.R., and Wheeler, T.R. (2020). Bibliometric Visualization and Analysis Software: State of the Art, Workflows, and Best Practices, Cornell University.

2. Jaradeh, M.Y., Oelen, A., Farfar, K.E., Prinz, M., D’Souza, J., Kismihók, G., Stocker, M., and Auer, S. (2019, January 19–21). Open research knowledge graph: Next generation infrastructure for semantic scholarly knowledge. Proceedings of the 10th International Conference on Knowledge Capture, Marina Del Rey, CA, USA.

3. Topic and sentiment aware microblog summarization for twitter;Ali;J. Intell. Inf. Syst.,2020

4. Wanigasooriya, A., and Silva, W.P.D. (2021). Automated Text Classification of Library Books into the Dewey Decimal Classification (DDC), University of Kelaniya.

5. Advances in natural language processing;Hirschberg;Science,2015

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3