Prediction of Machine-Generated Financial Tweets Using Advanced Bidirectional Encoder Representations from Transformers
-
Published:2024-06-06
Issue:11
Volume:13
Page:2222
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Arshed Muhammad Asad1ORCID, Gherghina Ștefan Cristian2ORCID, Dur-E-Zahra 1, Manzoor Mahnoor1
Affiliation:
1. School of Systems and Technology, University of Management and Technology, Lahore 54770, Pakistan 2. Department of Finance, Bucharest University of Economic Studies, 6 Piata Romana, 010374 Bucharest, Romania
Abstract
With the rise of Large Language Models (LLMs), distinguishing between genuine and AI-generated content, particularly in finance, has become challenging. Previous studies have focused on binary identification of ChatGPT-generated content, overlooking other AI tools used for text regeneration. This study addresses this gap by examining various AI-regenerated content types in the finance domain. Objective: The study aims to differentiate between human-generated financial content and AI-regenerated content, specifically focusing on ChatGPT, QuillBot, and SpinBot. It constructs a dataset comprising real text and AI-regenerated text for this purpose. Contribution: This research contributes to the field by providing a dataset that includes various types of AI-regenerated financial content. It also evaluates the performance of different models, particularly highlighting the effectiveness of the Bidirectional Encoder Representations from the Transformers Base Cased model in distinguishing between these content types. Methods: The dataset is meticulously preprocessed to ensure quality and reliability. Various models, including Bidirectional Encoder Representations Base Cased, are fine-tuned and compared with traditional machine learning models using TFIDF and Word2Vec approaches. Results: The Bidirectional Encoder Representations Base Cased model outperforms other models, achieving an accuracy, precision, recall, and F1 score of 0.73, 0.73, 0.73, and 0.72 respectively, in distinguishing between real and AI-regenerated financial content. Conclusions: This study demonstrates the effectiveness of the Bidirectional Encoder Representations base model in differentiating between human-generated financial content and AI-regenerated content. It highlights the importance of considering various AI tools in identifying synthetic content, particularly in the finance domain in Pakistan.
Reference39 articles.
1. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers;Gao;NPJ Digit. Med.,2023 2. Naveed, H., Khan, A.U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., and Mian, A. (2024). A comprehensive overview of large language models. arXiv. 3. Liu, X., Zheng, Y., Du, Z., Ding, M., Qian, Y., Yang, Z., and Tang, J. (2023). GPT Understands, Too, Elsevier. Available online: https://www.sciencedirect.com/science/article/pii/S2666651023000141. 4. Topal, M.O., Bas, A., and van Heerden, I. (2021). Exploring transformers in natural language generation: Gpt, bert, and xlnet. arXiv. 5. Mindner, L., Schlippe, T., and Schaaff, K. (2023). Classification of human-and ai-generated texts: Investigating features for chatgpt. International Conference on Artificial Intelligence in Education Technology, Springer.
|
|