Abstract
AbstractA sitemap represents an explicit specification of the design concept and knowledge organization of a website and is therefore considered as the website’s basic ontology. It not only presents the main usage flows for users, but also hierarchically organizes concepts of the website. Typically, sitemaps are defined by webmasters in the very early stages of the website design. However, during their life websites significantly change their structure, their content and their possible navigation paths. Even if this is not the case, webmasters can fail to either define sitemaps that reflect the actual website content or, vice versa, to define the actual organization of pages and links which do not reflect the intended organization of the content coded in the sitemaps. In this paper we propose an approach which automatically generates sitemaps. Contrary to other approaches proposed in the literature, which mainly generate sitemaps from the textual content of the pages, in this work sitemaps are generated by analyzing the Web graph of a website. This allows us to: i) automatically generate a sitemap on the basis of possible navigation paths, ii) compare the generated sitemaps with either the sitemap provided by the Web designer or with the intended sitemap of the website and, consequently, iii) plan possible website re-organization. The solution we propose is based on closed frequent sequence extraction and only concentrates on hyperlinks organized in “Web lists”, which are logical lists embedded in the pages. These “Web lists” are typically used for supporting users in Web site navigation and they include menus, navbars and content tables. Experiments performed on three real datasets show that the extracted sitemaps are much more similar to those defined by website curators than those obtained by competitor algorithms.
Funder
Seventh Framework Programme
Horizon 2020 Framework Programme
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Hardware and Architecture,Software
Reference47 articles.
1. Aggarwal, C.C., Zhai, C.: A Survey of Text Clustering Algorithms. In: Aggarwal, C.C., Zhai, C. (eds.) Mining Text Data, pp 77–128. Springer (2012)
2. Algosaibi, A.A., Melton, A.C.: Using the semantics inherent in sitemaps to learn ontologies. In: IEEE 38Th Annual Computer Software and Applications Conference, COMPSAC Workshops 2014, Vasteras, Sweden, July 21-25, 2014, pp 360–365. IEEE Computer Society (2014)
3. Anderson, C.R., Domingos, P., Weld, D.S.: Adaptive Web navigation for wireless devices. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence - Volume 2, IJCAI’01, pp 879–884. Morgan Kaufmann Publishers Inc., San Francisco (2001)
4. Baumgarten, M., Büchner, A.G., Anand, S.S., Mulvenna, M.D., Hughes, J.G.: User-driven navigation pattern discovery from internet data. In: Revised Papers from the International Workshop on Web Usage Analysis and User Profiling, WEBKDD ’99, pp 74–91. Springer, London (2000)
5. Crescenzi, V., Merialdo, P., Missier, P.: Clustering Web pages based on their structure. Data Knowl. Eng. 54(3), 279–299 (2005)
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献