An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine
-
Published:2018-06-23
Issue:3
Volume:7
Page:1119
-
ISSN:2227-524X
-
Container-title:International Journal of Engineering & Technology
-
language:
-
Short-container-title:IJET
Author:
Mor Jyoti,Dinesh Rai Dr,Naresh Kumar Dr
Abstract
In a large collection of web pages, it is difficult for search engines to keep their online repository updated. Major search engines have hundreds of web crawlers that crawl the WWW day and night and send the downloaded web pages via a network to be stored in the search engine’s database. These results in over utilization of network resources like bandwidth, CPU cycles and so on. This paper proposes an architecture that tries to reduce the utilization of shared network resources with the help of an advanced XML based approach. This focused crawling based architecture is trained to download only the high quality data from the internet leaving behind the web pages which are not relevant to the desired domain. Here, a detailed layout of the proposed system is described which is capable of reducing the load on network and reducing the problem arise in residency of mobile agent at the remote server.
Publisher
Science Publishing Corporation
Subject
Hardware and Architecture,General Engineering,General Chemical Engineering,Environmental Engineering,Computer Science (miscellaneous),Biotechnology
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献