A web crawler design for data mining-Reference-Cited by-同舟云学术

A web crawler design for data mining

Published:2001-10 Issue:5 Volume:27 Page:319-325
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Thelwall Mike¹

Affiliation:

1. University of Wolverhampton, Wolverhampton, UK,

Abstract

The content of the web has increasingly become a focus for academic research. Computer programs are needed in order to conduct any large-scale processing of web pages, requiring the use of a web crawler at some stage in order to fetch the pages to be analysed. The processing of the text of web pages in order to extract information can be expensive in terms of processor time. Consequently a distributed design is proposed in order to effectively use idle computing resources and to help information scientists avoid the need to employ dedicated equipment. A system developed using the model is examined and the advantages and limitations of the approach are discussed.

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Link

http://journals.sagepub.com/doi/pdf/10.1177/016555150102700503

Reference30 articles.

1. The anatomy of a large-scale hypertextual Web search engine

2. Authoritative sources in a hyperlinked environment

3. How did university departments interweave the Web: A study of connectivity and underlying factors

Cited by 111 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data Collection in Online Terrorism and Extremism Research: Strengths, Limitations, and Future Directions;Studies in Conflict & Terrorism;2024-06-18

2. Policy interventions and market innovation in rural China: Empirical evidence from Taobao villages;Economic Analysis and Policy;2024-03

3. Design a Data Analytics Training System to Explore Behavioral Intention and Immersion for Internal Enterprise Education;Journal of Organizational and End User Computing;2024-02-07

4. Communication Model of Digital Brands on Internet Using Network Analysis Algorithm;2024 IEEE 7th Eurasian Conference on Educational Innovation (ECEI);2024-01-26

5. Supporting the Demand on Mental Health Services with AI-Based Conversational Large Language Models (LLMs);BioMedInformatics;2023-12-22