Author:
Kaleel Parisa Begum,Sheen Shina
Abstract
In order to serve a diversified user base with a range of purposes, general search engines offer search results for a wide variety of topics and material categories on the internet. While Focused Crawlers (FC) deliver more specialized and targeted results inside particular domains or verticals, general search engines give a wider coverage of the web. For a vertical search engine, the performance of a focused crawler is extremely important, and several ways of improvement are applied. We propose an intelligent, focused crawler which uses Reinforcement Learning (RL) to prioritize the hyperlinks for long-term profit. Our implementation differs from other RL based works by encouraging learning at an early stage using a decaying ϵ-greedy policy to select the next link and hence enables the crawler to use the experience gained to improve its performance with more relevant pages. With an increase in the infertility rate all over the world, searching for information regarding the issues and details about artificial reproduction treatments available is in need by many people. Hence, we have considered infertility domain as a case study and collected web pages from scratch. We compare the performance of crawling tasks following ϵ-greedy and decaying ϵ-greedy policies. Experimental results show that crawlers following a decaying ϵ-greedy policy demonstrate better performance
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献