Affiliation:
1. Department of Informatics, Constantine the Philosopher University in Nitra, Trieda Andreja Hlinku 1, 949 74 Nitra, Slovakia
2. Institute of Computer Science, Pedagogical University of Cracow, ul. Podchorążych 2, 30-084 Kraków, Poland
Abstract
There are several possibilities to improve classification in natural language processing tasks. In this article, we focused on the issue of coreference resolution that was applied to a manually annotated dataset of true and fake news. This dataset was used for the classification task of fake news detection. The research aimed to determine whether performing coreference resolution on the input data before classification or classifying them without performing coreference resolution is more effective. We also wanted to verify whether it is possible to enhance classifier performance metrics by incorporating coreference resolution into the data preparation process. A methodology was proposed, in which we described the implementation methods in detail, starting from the identification of entity mentions in the text using the neuralcoref algorithm, then through word-embedding models (TF–IDF, Doc2Vec), and finally to several machine learning methods. The result was a comparison of the implemented classifiers based on the performance metrics described in the theoretical part. The best result for accuracy was observed for the dataset with coreference resolution applied, which had a median value of 0.8149, while for the F1 score, the best result had a median value of 0.8101. However, the more important finding is that the processed data with the application of coreference resolution led to an improvement in performance metrics in the classification tasks.
Funder
Scientific Grant Agency of the Ministry of Education of the Slovak Republic
Slovak Research and Development Agency
European Commission ERASMUS+ Program 2021
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference42 articles.
1. Automatic Text Summarization of COVID-19 Research Articles Using Recurrent Neural Networks and Coreference Resolution;Afsharizadeh;Front. Biomed. Technol.,2020
2. Bhattacharjee, S., Haque, R., de Buy Wenniger, G.M., and Way, A. (2020). Natural Language Processing and Information Systems, Springer.
3. A systematic review and comparative analysis of cross-document coreference resolution methods and tools;Beheshti;Computing,2017
4. Seljan, S., Tolj, N., and Dunđer, I. (2023, January 22–26). Information Extraction from Security-Related Datasets. Proceedings of the 2023 46th MIPRO ICT and Electronics Convention (MIPRO), Opatija, Croatia.
5. Kovač, A., Dunđer, I., and Seljan, S. (2022, January 23–27). An overview of machine learning algorithms for detecting phishing attacks on electronic messaging services. Proceedings of the 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献