A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach

Author:

Bozza Silvia,Roten Claude-Alain,Jover Antoine,Cammarota Valentina,Pousaz Lionel,Taroni Franco

Abstract

AbstractThe academic and scientific world in general is increasingly concerned about their inability to determine and ascertain the identity of the writer of a text. More and more often the question arises as to whether a scientific article or work handed in by a student was actually produced by the alleged author of the questioned text. The role of artificial intelligence (AI) is increasingly debated due to its dangers of undeclared use. A current example is undoubtedly the undeclared use of ChatGPT to write a scientific text. The article promotes an AI model-independent redundancy measure to support discrimination between hypotheses on authorship of various multilingual texts written by humans or produced by intelligence media such as ChatGPT. The syntax of texts written by humans tends to differ from that of texts produced by AIs. This difference can be grasped and quantified even with short texts (i.e. 1800 characters). This aspect of length is extremely important, because short texts imply a greater difficulty of analysis to characterize authorship. To meet the efficiency criteria required for the evaluation of forensic evidence, a probabilistic approach is implemented. In particular, to assess the value of the redundancy measure and to offer a consistent classification criterion, a metric called Bayes factor is implemented. The proposed Bayesian probabilistic method represents an original approach in stylometry. Analyses performed over multilingual texts (English and French) covering different scientific and human areas of interest (forensic science and socio-psycho-artistic topics) reveal the feasibility of a successful authorship discrimination with limited misclassification rates. Model performance is satisfactory even with small sample sizes.

Funder

Swiss National Science Foundation

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Reference24 articles.

1. Bacciu, A. et al. Bot and gender detection of Twitter accounts using distortion and LSA. Working Notes Papers of the CLEF 2019 Evaluation Labs Volume 2380 of CEUR Workshop, Lugano (2019).

2. Rangel, F. & Rosso, P. Overview of the 7th author profiling task at PAN 2019: Bots and gender profiling in Twitter. Working Notes Papers of the CLEF 2019 Evaluation Labs Volume 2380 of CEUR Workshop, Lugano (2019).

3. Espinosa, D. Y., Gómez-Adorno, H. & Sidorov, G. Bots and gender profiling using character bigrams notebook for PAN at CLEF 2019. Lugano (2019).

4. Savoy, J. Machine learning methods for stylometry: authorship attribution and author profiling (Springer, https://doi.org/10.1007/978-3-030-53360-1, 2020).

5. Holmes, D. I. Authorship attribution. Computers and the Humanities 28, 87–106 (1994).

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3