Affiliation:
1. Middle East Technical University, Ankara, Turkey
Abstract
Extracting patterns from web usage data helps to facilitate better web personalization and web structure readjustment. The classical frequency-based sequence mining techniques consider only the binary occurrences of web pages in sessions that result in the extraction of many patterns that are not informative for users. To handle this problem, utility-based mining technique has emerged, which assigns non-binary values, called utilities, to web pages and calculates pattern utilities accordingly. However, the utility of a pattern cannot always be determined from distinct web page utilities. For instance, the number of distinct users that traverse an extracted pattern or some demographic data about those users may affect the value of the extracted patterns. However, such information cannot be calculated directly from web page utilities. In this paper, we present a new approach based on a user-defined scoring mechanism so as to extract patterns from web log data. The proposed approach can limit the size of the search space; therefore it has the ability to extract patterns even for large and sparse datasets. The framework is hybrid in the sense that it combines clustering with a heuristic-based pattern extraction algorithm. Substantial experiments on real datasets show that the proposed solution effectively discovers patterns under user-defined evaluation.
Subject
Library and Information Sciences,Information Systems
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献