Affiliation:
1. Veermata Jijabai Technological Instutute, India
2. Veermata Jijabai Technological Institute, India
Abstract
The dark web contains sensitive data that strategic organizations must identify well in advance to anticipate and handle threats. However, associates will prefer to automate classifying the dark web pages instead of opening them due to their disturbing visual images and dangerous links and attachments. Most research is focused on web page text analysis to infer dark web data. But no visible attempt is observed in the literature that classifies dark web content at the structure level. In the chapter, extended scope of work aims to predict the genre of the webpage without opening the web page. The work converts web pages to their respective DOM (document object model) graphs. DOM graphs essentially represent web page structure. A GNN (graph neural network) is trained with constructed DOM graphs to predict the page's genre. The various graph properties like a number of nodes, edges, etc. for web page DOM graphs are extracted. Unsupervised learning (i.e., k-means clustering) is performed on the dataset to group the web pages into clusters based on similarity in structure.
Reference31 articles.
1. Techniques to detect terrorists/extremists on the dark web: a review
2. Dark Web and Its Impact in Online Anonymity and Privacy: A Critical Analysis and Review
3. Bracci, A., Nadini, M., Aliapoulios, M., McCoy, D., Gray, I., Teytelboym, A., ... Baronchelli, A. (2008). The COVID-19 online shadow economy. Academic Press.
4. Malware Trends on ‘Darknet’ Crypto-Markets: Research Review
5. Integrating the document object model with hyperlinks for enhanced topic distillation and information extraction