A Web Information Extraction Framework with Adaptive and Failure Prediction Feature-Reference-Cited by-同舟云学术

A Web Information Extraction Framework with Adaptive and Failure Prediction Feature

Published:2022-03-23 Issue:2 Volume:14 Page:1-21
ISSN:1936-1955
Container-title:Journal of Data and Information Quality
language:en
Short-container-title:J. Data and Information Quality

Author:

Patnaik Sudhir Kumar¹^ORCID,Babu C. Narendra¹

Affiliation:

1. MS Ramiah University of Applied Sciences, Bangalore, Karnataka, India

Abstract

The amount of information available on the internet today requires effective information extraction and processing to offer hyper-personalized user experiences. Inability to extract information by using traditional and machine learning techniques due to dynamic changes in website layout pose significant challenges to the technical community to keep up with such changes. The focus of existing machine learning-based information extraction framework is only on information extraction by using core extraction logic that is susceptible to website changes, thus missing out core features such as ability to handle proactive failure prediction and intelligent information extraction capabilities. The aim of this article is to build a robust and intelligent information extraction framework with the ability not only to proactively predict website failure but also automatically extract information using deep-learning techniques using You Only Look Once and Long Short-term Memory (LSTM) networks. The proactive detection using LSTM detects new location of the web page due to layout changes and enables automatic extraction of information of the new web page. A real-world case with retail website for intelligent information extraction and an offline experimentation environment is setup to demonstrate proactive failure prediction and automatic extraction resulting in high failure prediction, precision and recall of object detection and information extraction.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems and Management,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3495008

Reference38 articles.

1. Automated IT system failure prediction: A deep learning approach

2. The web changes everything

3. Predicting Web Server Crashes: A Case Study in Comparing Prediction Algorithms

4. Predicting Web Server Crashes: A Case Study in Comparing Prediction Algorithms

5. WebOQL: restructuring documents, databases and Webs

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Novel Energy Optimization Method in Construction Engineering;Lecture Notes on Data Engineering and Communications Technologies;2024

2. A Method for Extracting Sensitive Information from Long Text Based on Natural Language Processing Technology;Proceedings of the 2023 International Conference on Communication Network and Machine Learning;2023-10-27