Analysis of Deep Learning Approach Based on Convolution Neural Network (CNN) for Classification of Web Page Title and Description Text-Reference-Cited by-同舟云学术

Analysis of Deep Learning Approach Based on Convolution Neural Network (CNN) for Classification of Web Page Title and Description Text

Published:2022-12-31 Issue:2 Volume:11 Page:
ISSN:2549-2403
Container-title:Compiler
language:
Short-container-title:Compiler

Author:

Murdiyanto Aris Wahyu,Habibi Muhammad

Abstract

The volume of digital documents available online is growing exponentially due to the increasing use of the internet. Categorization of information obtained online is needed to make it easier for recipients of information to determine and filter which information is needed. Classification of web pages can be based on titles and descriptions, which are text data that can be done by utilizing deep learning technology for text classification. This study aimed to conduct data training and analysis experiments to determine the accuracy of the proposed deep learning architecture in classifying web page titles and descriptions. In this research, we proposed a Convolution Neural Network (CNN) architecture that generates few parameters. The training and evaluation set was conducted on the web page dataset provided by DMOZ. As a result, the proposed CNN architecture with the number of N (Dropout + 1D Convolution + ReLU activation) equal to 1 achieves the best validation accuracy. It achieves 79.51% with only generates 825,061 parameters. The proposed CNN architecture achieved outperformed performance on the accuracy of the five other technologies in the state-of-the-art.

Publisher

Institut Teknologi Dirgantara Adisutjipto (IDTA)

Subject

General Medicine

Reference28 articles.

1. A. Priyanto and M. R. Ma'arif, "Implementasi Web Scrapping dan Text Mining untuk Akuisisi dan Kategorisasi Informasi dari Internet (Studi Kasus: Tutorial Hidroponik)," Indonesian Journal of Information Systems, vol. 1, no. 1, pp. 25-33, Aug. 2018, doi: 10.24002/ijis.v1i1.1664.

2. J. Kristiyono and A. Nurrosyidah, "ANALISIS PERILAKU PENCARIAN INFORMASI DI INTERNET MELALUI FITUR VISUAL SEARCH," Scriptura, vol. 11, no. 2, pp. 96-104, Dec. 2021, doi: 10.9744/SCRIPTURA.11.2.96-104.

3. M. I. Akrianto, A. D. Hartanto, and A. Priadana, "The Best Parameters to Select Instagram Account for Endorsement using Web Scraping," in 2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Nov. 2019, pp. 40-45. doi: 10.1109/ICITISEE48480.2019.9004038.

4. A. Priadana and A. W. Murdiyanto, "Instagram Hashtag Trend Monitoring Using Web Scraping," Journal Pekommas, vol. 5, no. 1, p. 23, Apr. 2020, doi: 10.30818/jpkm.2020.2050103.

5. A. W. Murdiyanto and A. Priadana, "Analysis of web scraping techniques to get keywords suggestion and allintitle automatically from Google Search Engines," Compiler, vol. 10, no. 2, pp. 71-78, Nov. 2021, doi: 10.28989/COMPILER.V10I2.1064.