Web Harvesting-Reference-Cited by-同舟云学术

Web Harvesting

Published:2018 Issue: Volume: Page:199-226
ISSN:
Container-title:The Dark Web
language:
Short-container-title:

Author:

Umamageswari B.¹,Kalpana R.²

Affiliation:

1. New Prince Shri Bhavani College of Engineering and Technology, India

2. Pondicherry Engineering College, India

Abstract

Web mining is done on huge amounts of data extracted from WWW. Many researchers have developed several state-of-the-art approaches for web data extraction. So far in the literature, the focus is mainly on the techniques used for data region extraction. Applications which are fed with the extracted data, require fetching data spread across multiple web pages which should be crawled automatically. For this to happen, we need to extract not only data regions, but also the navigation links. Data extraction techniques are designed for specific HTML tags; which questions their universal applicability for carrying out information extraction from differently formatted web pages. This chapter focuses on various web data extraction techniques available for different kinds of data rich pages, classification of web data extraction techniques and comparison of those techniques across many useful dimensions.

Publisher

IGI Global

Reference46 articles.

1. Extracting structured data from Web pages.;A.Arasu;Proceedings of the ACM SIGMOD International Conference on Management of Data,2003

2. Baumgartner, R., Gatterbauer, W., & Gottlob, G. (2009). Web data extraction system. Encyclopedia of Database Systems, 3465-3471.

3. Towards a unified solution

4. Bolin, M. (2005). End-user programming for the web. (Master’s thesis). Massachusetts Institute of Technology.

5. Cai, D., Yu, S., & Wen, J.-R. Ma & W.-Y. (2003). Extracting Content Structure for Web Pages based on Visual Representation. In Proc. Fifth Asia Pacific Web Conf.(APWeb).

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A worldwide comparison of long-distance running training in 2019 and 2020: associated effects of the COVID-19 pandemic;PeerJ;2022-03-25

2. An Empirical Study of Deep Web based on Graph Analysis;SSRN Electronic Journal;2020