An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine-Reference-Cited by-同舟云学术

An XML based Web Crawler with Page Revisit Policy and Updation in Local Repository of Search Engine

Published:2018-06-23 Issue:3 Volume:7 Page:1119
ISSN:2227-524X
Container-title:International Journal of Engineering & Technology
language:
Short-container-title:IJET

Author:

Mor Jyoti,Dinesh Rai Dr,Naresh Kumar Dr

Abstract

In a large collection of web pages, it is difficult for search engines to keep their online repository updated. Major search engines have hundreds of web crawlers that crawl the WWW day and night and send the downloaded web pages via a network to be stored in the search engine’s database. These results in over utilization of network resources like bandwidth, CPU cycles and so on. This paper proposes an architecture that tries to reduce the utilization of shared network resources with the help of an advanced XML based approach. This focused crawling based architecture is trained to download only the high quality data from the internet leaving behind the web pages which are not relevant to the desired domain. Here, a detailed layout of the proposed system is described which is capable of reducing the load on network and reducing the problem arise in residency of mobile agent at the remote server.

Publisher

Science Publishing Corporation

Subject

Hardware and Architecture,General Engineering,General Chemical Engineering,Environmental Engineering,Computer Science (miscellaneous),Biotechnology

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Key Technologies of Building Knowledge Base Based on Oracle Database;Lecture Notes on Data Engineering and Communications Technologies;2024

2. EMACrawler: Web Arama Motoru Veritabanı Tazeliği Optimizasyonu;Journal of Polytechnic;2023-11-15

3. An Empirical Framework for Recommendation-based Location Services Using Deep Learning;Engineering, Technology & Applied Science Research;2022-10-02

4. Technical Job Recommendation System Using APIs and Web Crawling;Computational Intelligence and Neuroscience;2022-06-21