Structured databases on the web-Reference-Cited by-同舟云学术

Structured databases on the web

Published:2004-09 Issue:3 Volume:33 Page:61-70
ISSN:0163-5808
Container-title:ACM SIGMOD Record
language:en
Short-container-title:SIGMOD Rec.

Author:

Chang Kevin Chen-Chuan¹,He Bin¹,Li Chengkai¹,Patel Mitesh¹,Zhang Zhen¹

Affiliation:

1. University of Illinois at Urbana-Champaign

Abstract

The Web has been rapidly "deepened" by the prevalence of databases online. With the potentially unlimited information hidden behind their query interfaces, this "deep Web" of searchable databses is clearly an important frontier for data access. This paper surveys this relatively unexplored frontier, measuring characteristics pertinent to both exploring and integrating structured Web sources. On one hand, our "macro" study surveys the deep Web at large, in April 2004, adopting the random IP-sampling approach, with one million samples. (How large is the deep Web? How is it covered by current directory services?) On the other hand, our "micro" study surveys source-specific characteristics over 441 sources in eight representative domains, in December 2002. (How "hidden" are deep-Web sources? How do search engines cover their data? How complex and expressive are query forms?) We report our observations and publish the resulting datasets to the research community. We conclude with several implications (of our own) which, while necessarily subjective, might help shape research directions and solutions.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/1031570.1031584

Reference28 articles.

1. BrightPlanet.com. The deep web: Surfacing hidden value. Accessible at http://brightplanet.com July 2000.]] BrightPlanet.com. The deep web: Surfacing hidden value. Accessible at http://brightplanet.com July 2000.]]

2. Accessibility of information on the web

3. Ed O'Neill Brian Lavoie and Rick Bennett. Web characterization. Accessible at "http://wcp.oclc.org".]] Ed O'Neill Brian Lavoie and Rick Bennett. Web characterization. Accessible at "http://wcp.oclc.org".]]

4. GNU. wget. Accessible at "http://www.gnu.org/software/wget/wget.html".]] GNU. wget. Accessible at "http://www.gnu.org/software/wget/wget.html".]]

5. Kevin Chen-Chuan Chang Bin He Chengkai Li and Zhen Zhang. The UIUC web integration repository. Computer Science Department University of Illinois at Urbana-Champaign. http://metaquerier.cs.uiuc.edu/repository 2003.]] Kevin Chen-Chuan Chang Bin He Chengkai Li and Zhen Zhang. The UIUC web integration repository. Computer Science Department University of Illinois at Urbana-Champaign. http://metaquerier.cs.uiuc.edu/repository 2003.]]

Cited by 144 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The Proposed Framework of View-Dependent Data Integration Architecture;Advances in Computational Intelligence and Robotics;2024-04-12

2. Natural Language Interface for Covid-19 Amharic Database Using LSTM Encoder Decoder Architecture with Attention;2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA);2021-11-22

3. Query interface schema extracting from deep web using ontology;2021 International Conference on Image, Video Processing, and Artificial Intelligence;2021-11-11

4. IHWC: intelligent hidden web crawler for harvesting data in urban domains;Complex & Intelligent Systems;2021-07-24

5. SmartCrawler: A Three-Stage Ranking Based Web Crawler for Harvesting Hidden Web Sources;Computers, Materials & Continua;2021