MEDAL: A Multimodality-Based Effective Data Augmentation Framework for Illegal Website Identification
-
Published:2024-06-05
Issue:11
Volume:13
Page:2199
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Wen Li1, Zhang Min1, Wang Chenyang1, Guo Bingyang1ORCID, Ma Huimin1, Xue Pengfei1, Ding Wanmeng1, Zheng Jinghua1
Affiliation:
1. College of Electronic Engineering, National University of Defense Technology, Hefei 230037, China
Abstract
The emergence of illegal (gambling, pornography, and attraction) websites seriously threatens the security of society. Due to the concealment of illegal websites, it is difficult to obtain labeled data with high quantity. Meanwhile, most illegal websites usually disguise themselves to avoid detection; for example, some gambling websites may visually resemble gaming websites. However, existing methods ignore the means of camouflage in a single modality. To address the above problems, this paper proposes MEDAL, a multimodality-based effective data augmentation framework for illegal website identification. First, we established an illegal website identification framework based on tri-training that combines information from different modalities (including image, text, and HTML) while making full use of numerous unlabeled data. Then, we designed a multimodal mutual assistance module that is integrated with the tri-training framework to mitigate the introduction of error information resulting from an unbalanced single-modal classifier performance in the tri-training process. Finally, the experimental results on the self-developed dataset demonstrate the effectiveness of the proposed framework, performing well on accuracy, precision, recall, and F1 metrics.
Funder
National Key R&D Program of China
Reference41 articles.
1. Yang, H., Du, K., Zhang, Y., Hao, S., Li, Z., Liu, M., Wang, H., Duan, H., Shi, Y., and Su, X. (2019, January 9–13). Casino royale: A deep exploration of illegal online gambling. Proceedings of the 35th Annual Computer Security Applications Conference, San Juan, PR, USA. 2. Demystifying Illegal Mobile Gambling Apps;Gao;Proc. Web Conf.,2021 3. Let gambling hide nowhere: Detecting illegal mobile gambling apps via heterogeneous graph-based encrypted traffic analysis;Gu;Comput. Netw.,2024 4. Sheng, S., Wardman, B., Warner, G., Cranor, L.F., Hong, J.I., and Zhang, C. (2011, January 1–2). An Empirical Analysis of Phishing Blacklists. Proceedings of the International Conference on Email and Anti-Spam, Perth, Australia. 5. Sahoo, D., Liu, C., and Hoi, S.C.H. (2017). Malicious URL Detection using Machine Learning: A Survey. arXiv.
|
|