Short text classification approach to identify child sexual exploitation material

Author:

Al-Nabki MHD Wesam,Fidalgo Eduardo,Alegre Enrique,Alaiz-Rodriguez Rocio

Abstract

AbstractProducing or sharing Child Sexual Exploitation Material (CSEM) is a severe crime that Law Enforcement Agencies (LEAs) fight daily. When the LEA seizes a computer from a potential producer or consumer of the CSEM, it analyzes the storage devices of the suspect looking for evidence. Manual inspection of CSEM is time-consuming given the limited time available for Spanish police to use a search warrant. Our approach to speeding up the identification of CSEM-related files is to analyze only the file names and their absolute paths rather than their content. The main challenge lies in handling short and sparse texts that are deliberately distorted by file owners using obfuscated words and user-defined naming patterns. We present two approaches to CSEM identification. The first employs two independent classifiers, one for the file name and the other for the file path, and their outputs are then combined. Conversely, the second approach uses only the file name classifier to iterate over an absolute path. Both operate at the character n-gram level, whereas novel binary and orthographic features are presented to enrich the text representation. We benchmarked six classification models based on machine learning and convolutional neural networks. The proposed classifier has an F1 score of 0.988, which can be a promising tool for LEAs.

Funder

Spanish National Cybersecurity Institute

Publisher

Springer Science and Business Media LLC

Subject

Multidisciplinary

Reference34 articles.

1. Europol. Eu policy cycle - empact. https://home-affairs.ec.europa.eu/policies/internal-security/child-sexual-abuse_en (2022). Accessed 22 Nov 2020.

2. Europol. Child sexual exploitation. https://www.missingkids.org/theissues/csam (2022). Accessed 11 Nov 2022.

3. The tor project: Privacy and freedom online. https://www.torproject.org/ (2022). Accessed 29 Dec 2022.

4. Freenet project. https://freenetproject.org/ (2022). Accessed 29 Dec 2022.

5. Packeer, S. & Kannangara, D. Detection of pedophilia content online: A case study using telegram. Iraqi J. Comput. Sci. Math. 3, 72–77 (2022).

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3