Affiliation:
1. Faculty of Computing, Harbin Institute of Technology, Harbin 150001, China
2. Beijing Qihoo Technology Co., Ltd, Beijing 100015, China
Abstract
Cybercriminals often register many pornographic or gambling domains (known as abusive domains) with similar character compositions in bulk to reduce their investment in buying domains and make it easier for clients to remember and spread them. Therefore, this study combines the ideas of text similarity and text generation and proposes an abusive domain generation model based on GRU for rapidly generating new abusive domain names from known ones. Additionally, we develop a two-layer detection system for pornography and gambling domains using fastText and CNN models to obtain an abusive domain dataset for model training and validation. In the end, our detection system identifies pornographic and gambling domains with 99% precision while balancing correctness and speed. By inputting 40,000 random keywords into the abusive domain generation model, we obtained 130,220 online domains that served web pages, of which about 66% were pornographic or gambling domains. The results show that by exploiting cybercriminals’ behaviors in registering abusive domain names, such as bulk registration of similar domain names, we can prospectively acquire a large number of new abusive domains based on known ones. This study demonstrates that predicting new abusive domains not only expands the domain blacklist but also allows researchers to target the generated suspicious domains and dispose of them in time before they show abusive behavior.
Funder
Young Teacher Development Fund of Harbin Institute of Technology
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Information Systems
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献