Fast prediction of web user browsing behaviours using most interesting patterns-Reference-Cited by-同舟云学术

Fast prediction of web user browsing behaviours using most interesting patterns

Published:2016-10-01 Issue:1 Volume:44 Page:74-90
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Sisodia Dilip Singh¹,Khandal Vijay¹,Singhal Riya¹

Affiliation:

1. Department of Computer Science & Engineering, National Institute of Technology Raipur, India

Abstract

The prediction of users’ browsing behaviours is essential for putting appropriate information on the web. The browsing behaviours are stored as navigational patterns in web server logs. These weblogs are used to predict the frequently accessed patterns of web users, which can be used to predict user behaviour and to collect business intelligence. However, owing to the exponentially increasing weblog size, existing implementations of frequent-pattern-mining algorithms often take too much time and generate too many redundant patterns. This article introduces the most interesting pattern-based parallel FP-growth (MIP-PFP) algorithm. MIP-PFP is an improved implementation of the parallel FP-growth algorithm and implemented on the Apache Spark platform for extracting frequent patterns from huge weblogs. Experiments were performed on openly available National Aeronautics and Space Administration (NASA) weblog data to test the effectiveness of the MIP-PFP algorithm. The results were compared with existing implementation of PFP algorithms. The results suggest that the MIP-PFP algorithm running on Apache Spark reduced the execution time by a factor of more than 10 times. The effect of sequence length that has been used as input to the MIP-PFP algorithm was also evaluated with different interestingness parameters including support, confidence, lift, leverage, cosine, and conviction. It is observed from experimental results that only sequences of length greater than three produced a very low value of support for these interestingness measures.

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Link

http://journals.sagepub.com/doi/pdf/10.1177/0165551516673293

Reference34 articles.

1. Augmented intuitive dissimilarity metric for clustering of Web user sessions

2. Towards a Hypermedia-enabled and Web-based Data Analysis Framework

3. MapReduce-based web mining for prediction of web-user navigation

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Web-S4AE: a semi-supervised stacked sparse autoencoder model for web robot detection;Neural Computing and Applications;2023-05-27

2. IRPDP_HT2: a scalable data pre-processing method in web usage mining using Hadoop MapReduce;Soft Computing;2023-03-24

3. Fine-Grained High-Utility Dynamic Fingerprinting Extraction for Network Traffic Analysis;Applied Sciences;2022-11-15

4. IRPDP_HT2: A Scalable Data Pre-processing Method in Web Usage Mining using Hadoop-MapReduce;2022-08-25

5. Web Usage Mining by Neural Hybrid Prediction with Markov Chain Components;Journal of Web Engineering;2021-07-19