Affiliation:
1. Politecnico di Torino, Torino, Italy
2. Nokia Bell Labs, Nozay, France
3. LaBRI, Université de Bordeaux, Talence, France
Abstract
Understanding how people interact with the web is key for a variety of applications, e.g., from the design of effective web pages to the definition of successful online marketing campaigns. Browsing behavior has been traditionally represented and studied by means of
clickstreams
, i.e., graphs whose vertices are web pages, and edges are the paths followed by users. Obtaining large and representative data to extract clickstreams is, however, challenging.
The evolution of the web questions whether browsing behavior is changing and, by consequence, whether properties of clickstreams are changing. This article presents a longitudinal study of clickstreams from 2013 to 2016. We evaluate an anonymized dataset of HTTP traces captured in a large ISP, where thousands of households are connected. We first propose a methodology to identify actual URLs requested by users from the massive set of requests automatically fired by browsers when rendering web pages. Then, we characterize web usage patterns and clickstreams, taking into account both the temporal evolution and the impact of the device used to explore the web. Our analyses precisely quantify various aspects of clickstreams and uncover interesting patterns, such as the typical short paths followed by people while navigating the web, the fast increasing trend in browsing from mobile devices, and the different roles of search engines and social networks in promoting content.
Finally, we contribute a dataset of anonymized clickstreams to the community to foster new studies.<sup;>1</sup;>
Funder
BigDAMA
Vienna Science and Technology Fund
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Share and Multiply: Modeling Communication and Generated Traffic in Private WhatsApp Groups;IEEE Access;2023
2. Toward practical defense against traffic analysis attacks on encrypted DNS traffic;Computers & Security;2023-01
3. Flow-Based User Click Identification in Encrypted Web Traffic;2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta);2022-12
4. Routines and the Predictability of Day-to-Day Web Use;Media Psychology;2022-09-06
5. RLBrowse: Generating Realistic Packet Traces with Reinforcement Learning;NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium;2022-04-25