Abstract
Thanks to the rapid expansion of the Internet, anyone can now access a vast array of information online. However, as the volume of web content continues to grow exponentially, search engines face challenges in delivering relevant results. Early search engines primarily relied on the words or phrases found within web pages to index and rank them. While this approach had its merits, it often resulted in irrelevant or inaccurate results. To address this issue, more advanced search engines began incorporating the hyperlink structures of web pages to help determine their relevance. While this method improved retrieval accuracy to some extent, it still had limitations, as it did not consider the actual content of web pages. The objective of the work is to enhance Web Information Retrieval methods by leveraging three key components: text content analysis, link analysis, and log file analysis. By integrating insights from these multiple data sources, the goal is to achieve a more accurate and effective ranking of relevant web pages in the retrieved document set, ultimately enhancing the user experience and delivering more precise search results the proposed system was tested with both multi-word and single-word queries, and the results were evaluated using metrics such as relative recall, precision, and F-measure. When compared to Google’s PageRank algorithm, the proposed system demonstrated superior performance, achieving an 81% mean average precision, 56% average relative recall, and a 66% F-measure.
Reference32 articles.
1. Afolabi, I.T., Makinde, O.S., and Oladipupo, O.O., 2019. Semantic web mining for content-based online shopping recommender systems. International Journal of Intelligent Information Technologies, 15(4), pp.41-56.
2. Al-Anzi, F., and Abuzeina, D., 2020. Enhanced latent semantic indexing using cosine similarity measures for medical application. International Arab Journal of Information Technology, 17(5), pp.742-749.
3. Alhaidari, F., Alwarthan, S., and Alamoudi, A., 2020. User preference based weighted page ranking algorithm. In: ICCAIS 2020-3rd International Conference on Computer Applications and Information Security, pp.1-6.
4. Ali, F., and Khusro, S., 2021. Content and link-structure perspective of ranking webpages: A review. Computer Science Review, 40, p.100397.
5. Allahyari, M., Pouriyeh, S., Assefi, M., Safaei, S., Trippe, E.D., Gutierrez, J.B., and Kochut, K., 2017. A Brief Survey of Text Mining: Classification, Clustering and Extraction Techniques. Journal of Intelligent Information Systems, 2017, 1(1), pp.1-13.