Affiliation:
1. Department of Computer Science, Fatima College, Madurai, India
Abstract
Information available on the internet is wide, diverse, and dynamic. Since an enormous amount of information is available online, finding similarity between webpages using efficient hyperlink analysis is a challenging task. In this article, the researcher proposes an improved PageSim algorithm which measurse the importance of a webpage based on the PageRank values of connected webpage. Therefore, the proposed algorithm uses heterogeneous propagation of the PageRank score, based on the prestige measure of each webpage. The existing and the improved PageSim algorithms are implemented with a sample web graph. Real time Citation Networks, namely the ZEWAIL Citation Network and the DBLP Citation Network are used to test and compare the existing and improved PageSim algorithms. By using this proposed algorithm, it has been found that a similarity score between two different webpages significantly increases based on common information features and significantly decreases based on distinct factors.
Reference33 articles.
1. Searching the Web
2. A Survey on PageRank Computing
3. Web mining research
4. Brin, S., Motwani, R., Page, L., & Winograd, T. (1999). The Pagerank Citation Ranking:Bringing Order to the Web [Technical Report]. Stanford Digital Libraries.
5. Brin, S., & Page, L. (1998). The anatomy of a large scale hyper textual web search engine. Computer Networks and ISDN Systems, 30(1-7), 107-117.