Affiliation:
1. Kakatiya University
2. Christu Jyothi Institute of Technology and Science
Abstract
Abstract
Large language models (LLMs) have emerged as powerful tools for generating human-quality text, raising concerns about their potential for misuse in academic settings. This paper investigates the use of DistilBERT, a distilled version of BERT, for detecting LLM-generated text. We evaluate its performance on two publicly available datasets, LLM-Detect AI Generated Text and DAIGT-V3 Train Dataset, achieving an average accuracy of around 94%. Our findings suggest that DistilBERT is a promising tool for safeguarding academic integrity in the era of LLMs.
Publisher
Research Square Platform LLC
Reference19 articles.
1. Kim, J. K. and Chua, M. and Rickard, M. and Lorenzo, A. J. (2023) ChatGPT and large language model (LLM) chatbots: The current state of acceptability and a proposal for guidelines on utilization in academic medicine. Journal of Pediatric Urology 19(5): 598--604 https://doi.org/10.1016/j.jpurol.2023.05.018
2. Jungherr, A. (2023) Using ChatGPT and Other Large Language Model (LLM) Applications for Academic Paper Assignments. https://doi.org/10.31235/osf.io/d84q6
3. Devlin, J. and Chang, M. and Lee, K. and Toutanova, K. (2018) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. https://arxiv.org/pdf/1810.04805v2
4. Sanh, V. and Debut, L. and Chaumond, J. and Wolf, T. (2019) DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. https://arxiv.org/pdf/1910.01108.pdf
5. Joshy, A. and Sundar, S. (2022) Analyzing the Performance of Sentiment Analysis using BERT, DistilBERT, and RoBERTa. https://doi.org/10.1109/iprecon55716.2022.10059542