Affiliation:
1. Computer Science and Information Engineering, National Central University, Taoyuan City, Taiwan
Abstract
Web logs have been widely used to represent the web page visits of online users. However, we found that web logs in Chrome’s browsing history only record 57% of users’ visited websites, i.e., nearly half of a user’s website visits are not recorded. Additionally, 5.1% of the visits recorded in the web log occur because of unconscious user actions, i.e., these page visits are not initiated from users. We created a Google Chrome plugin and recruited users to install the plugin to collect and analyze the conscious URL visits, unconscious URL visits, and “missing” URL visits (i.e., the visits unrecorded in the traditional web log). We reported the statistics of these behaviors. We showed that sorting popular website categories based on traditional web logs differs from the rankings obtained when including missing visits or excluding unintentional visits. We predicted users’ future behaviors based on three types of training data – all the visits in modern web logs, the intentional visits in web logs, and the intentional visits plus missing visits in web logs. The experimental results indicate that missing visits in web logs may contain additional information, and unintentional visits in web logs may contain more noise than information for user modeling. Consequently, we need to be careful of the observations and conclusions derived from web log analyses because the web log data could be an incomplete and noisy dataset of a user’s visited web pages.
Funder
Ministry of Science and Technology of Taiwan
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems and Management,Information Systems
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献