Abstract
This study carries out a comprehensive comparison of fine-tuned GPT models (GPT-2, GPT-3, GPT-3.5) and LLaMA-2 models (LLaMA-2 7B, LLaMA-2 13B, LLaMA-2 70B) in text classification, addressing dataset sizes, model scales, and task diversity. Since its inception in 2018, the GPT series has been pivotal in advancing NLP, with each iteration introducing substantial enhancements. Despite its progress, detailed analyses, especially against competitive open-source models like the LLaMA-2 series in text classification, remain scarce. The current study fills this gap by fine-tuning these models across varied datasets, focusing on enhancing task-specific performance in hate speech and offensive language detection, fake news classification, and sentiment analysis. The learning efficacy and efficiency of the GPT and LLaMA-2 models were evaluated, providing a nuanced guide to choosing optimal models for NLP tasks based on architectural benefits and adaptation efficiency with limited data and resources. In particular, even with datasets as small as 1,000 rows per class, the F1 scores for the GPT-3.5 and LLaMA-2 models exceeded 0.9, reaching 0.99 with complete datasets. Additionally, the LLaMA-2 13B and 70B models outperformed GPT-3, demonstrating their superior efficiency and effectiveness in text classification. Both the GPT and LLaMA-2 series showed commendable performance on all three tasks, underscoring their ability to handle a diversity of tasks. Based on the size, performance, and resources required for fine-tuning the model, this study identifies LLaMA-2 13B as the most optimal model for NLP tasks.
Publisher
Engineering, Technology & Applied Science Research
Reference19 articles.
1. E. Yilmaz and O. Can, "Unveiling Shadows: Harnessing Artificial Intelligence for Insider Threat Detection," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13341–13346, Apr. 2024.
2. A. Kazm, A. Ali, and H. Hashim, "Transformer Encoder with Protein Language Model for Protein Secondary Structure Prediction," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13124–13132, Apr. 2024.
3. R. Sharma, S. Deol, U. Kaushish, P. Pandey, and V. Maurya, "DWAEF: a deep weighted average ensemble framework harnessing novel indicators for sarcasm detection 1," Data Science, vol. 6, no. 1–2, pp. 17–44, Jan. 2023.
4. K. A. Aldriwish, "Empowering Learning through Intelligent Data-Driven Systems," Engineering, Technology & Applied Science Research, vol. 14, no. 1, pp. 12844–12849, Feb. 2024.
5. A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, "Improving Language Understanding by Generative Pre-Training." [Online]. Available: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献