Affiliation:
1. Department of Computer Science and Engineering, M.S. Ramaiah University of Applied Sciences, MSR Nagar, Bangalore, India
Abstract
Web data extraction has evolved over the years with extracting data from documents to today’s World Wide Web (WWW). The WWW growth has placed data at the centre of this ecosystem and benefited society at large, businesses and consumers. The proposed system uses deep learning technique, Faster region convolutional neural network (R-CNN) for automated navigation, extraction of data and self-healing of data extraction engine to adapt to dynamic changes in website layout. The proposed system trains the Faster R-CNN model for detection of product in the web page using bounding box image detection technique and extracts product details with high extraction accuracy. Deep learning technique has advanced rapidly in the different fields for image detection, but its application in data extraction makes this paper unique. An ecommerce retail website is used as real-world example to prove the self-healing capability of the proposed automated web data extraction system.
Publisher
World Scientific Pub Co Pte Ltd
Subject
Library and Information Sciences,Computer Networks and Communications,Computer Science Applications
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献