Unveiling AI-Generated Financial Text: A Computational Approach Using Natural Language Processing and Generative Artificial Intelligence-Reference-Cited by-同舟云学术

Unveiling AI-Generated Financial Text: A Computational Approach Using Natural Language Processing and Generative Artificial Intelligence

Published:2024-05-15 Issue:5 Volume:12 Page:101
ISSN:2079-3197
Container-title:Computation
language:en
Short-container-title:Computation

Author:

Arshed Muhammad Asad¹^ORCID,Gherghina Ștefan Cristian²^ORCID,Dewi Christine³⁴^ORCID,Iqbal Asma¹,Mumtaz Shahzad⁵⁶^ORCID

Affiliation:

1. Department of Software Engineering, University of Management and Technology, Lahore 54770, Pakistan

2. Department of Finance, Bucharest University of Economic Studies, 6 Piata Romana, 010374 Bucharest, Romania

3. Department of Information Technology, Satya Wacana Christian University, Salatiga 50715, Indonesia

4. School of Information Technology, Deakin University, Campus 221 Burwood Hwy, Burwood, VIC 3125, Australia

5. Department of Data Science, The Islamia University of Bahawalpur, Bahawalpur 63100, Pakistan

6. School of Natural and Computing Sciences, University of Aberdeen, Aberdeen AB24 3FX, Scotland, UK

Abstract

This study is an in-depth exploration of the nascent field of Natural Language Processing (NLP) and generative Artificial Intelligence (AI), and it concentrates on the vital task of distinguishing between human-generated text and content that has been produced by AI models. Particularly, this research pioneers the identification of financial text derived from AI models such as ChatGPT and paraphrasing tools like QuillBot. While our primary focus is on financial content, we have also pinpointed texts generated by paragraph rewriting tools and utilized ChatGPT for various contexts this multiclass identification was missing in previous studies. In this paper, we use a comprehensive feature extraction methodology that combines TF–IDF with Word2Vec, along with individual feature extraction methods. Importantly, combining a Random Forest model with Word2Vec results in impressive outcomes. Moreover, this study investigates the significance of the window size parameters in the Word2Vec approach, revealing that a window size of one produces outstanding scores across various metrics, including accuracy, precision, recall and the F1 measure, all reaching a notable value of 0.74. In addition to this, our developed model performs well in classification, attaining AUC values of 0.94 for the ‘GPT’ class; 0.77 for the ‘Quil’ class; and 0.89 for the ‘Real’ class. We also achieved an accuracy of 0.72, precision of 0.71, recall of 0.72, and F1 of 0.71 for our extended prepared dataset. This study contributes significantly to the evolving landscape of AI text identification, providing valuable insights and promising directions for future research.

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-3197/12/5/101/pdf

Reference38 articles.

1. Muneer, A., Alwadain, A., Ragab, M.G., and Alqushaibi, A. (2023). Cyberbullying Detection on Social Media Using Stacking Ensemble Learning and Enhanced BERT. Information, 14.

2. Hadi, M.U., Al Tashi, Q., Qureshi, R., Shah, A., Muneer, A., Irfan, M., Zafar, A., Shaikh, M.B., Akhtar, N., and Wu, J. (2023). Large Language Models: A Comprehensive Survey of its Applications, Challenges, Limitations, and Future Prospects. Authorea Prepr.

3. Demystifying the Role of Natural Language Processing (NLP) in Smart City Applications: Background, Motivation, Recent Advances, and Future Research Directions;Tyagi;Wirel. Pers. Commun.,2023

4. Natural language processing: State of the art, current trends and challenges;Khurana;Multimed. Tools Appl.,2023

5. Collaborating with ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education;Pavlik;J. Mass Commun. Educ.,2023