Affiliation:
1. MS Ramiah University of Applied Sciences, Bangalore, Karnataka, India
Abstract
The amount of information available on the internet today requires effective information extraction and processing to offer hyper-personalized user experiences. Inability to extract information by using traditional and machine learning techniques due to dynamic changes in website layout pose significant challenges to the technical community to keep up with such changes. The focus of existing machine learning-based information extraction framework is only on information extraction by using core extraction logic that is susceptible to website changes, thus missing out core features such as ability to handle proactive failure prediction and intelligent information extraction capabilities. The aim of this article is to build a robust and intelligent information extraction framework with the ability not only to proactively predict website failure but also automatically extract information using deep-learning techniques using You Only Look Once and Long Short-term Memory (LSTM) networks. The proactive detection using LSTM detects new location of the web page due to layout changes and enables automatic extraction of information of the new web page. A real-world case with retail website for intelligent information extraction and an offline experimentation environment is setup to demonstrate proactive failure prediction and automatic extraction resulting in high failure prediction, precision and recall of object detection and information extraction.
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems and Management,Information Systems
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Novel Energy Optimization Method in Construction Engineering;Lecture Notes on Data Engineering and Communications Technologies;2024
2. A Method for Extracting Sensitive Information from Long Text Based on Natural Language Processing Technology;Proceedings of the 2023 International Conference on Communication Network and Machine Learning;2023-10-27