Author:
Murdiyanto Aris Wahyu,Habibi Muhammad
Abstract
The volume of digital documents available online is growing exponentially due to the increasing use of the internet. Categorization of information obtained online is needed to make it easier for recipients of information to determine and filter which information is needed. Classification of web pages can be based on titles and descriptions, which are text data that can be done by utilizing deep learning technology for text classification. This study aimed to conduct data training and analysis experiments to determine the accuracy of the proposed deep learning architecture in classifying web page titles and descriptions. In this research, we proposed a Convolution Neural Network (CNN) architecture that generates few parameters. The training and evaluation set was conducted on the web page dataset provided by DMOZ. As a result, the proposed CNN architecture with the number of N (Dropout + 1D Convolution + ReLU activation) equal to 1 achieves the best validation accuracy. It achieves 79.51% with only generates 825,061 parameters. The proposed CNN architecture achieved outperformed performance on the accuracy of the five other technologies in the state-of-the-art.
Publisher
Institut Teknologi Dirgantara Adisutjipto (IDTA)