Affiliation:
1. School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Durban 3630, South Africa
Abstract
Within the modern era, corporates are compelled to own an appealing and effective website to survive and thrive within the competitive global digital marketplace. Whilst there are several web metrics to focus on, a key focus area of web analytics is the level of drop-offs. The drop-off rate represents the proportion of visitors that prematurely drop-off a website. Whilst the exact reason behind the drop-off may only be assumed (could be due to the loss of Internet connectivity or dis-interest), this study attempted to identify the triggers behind website drop-offs through a survival problem. Each person entering the website, at a given instance, can view any number of web pages (such as home, contact us, about us, etc.). However, on the studied website, roughly one in five visitors have prematurely dropped-off. The study was conducted on an engineering corporate website with the data collected via the Google Analytics tracking tool. The aim was to determine the key hazards that contributed to the observed drop-off rate through the use of a cox proportional hazard model and a survival random forest model. On the studied website, based on empirical evidence, the online visitors were censored so that those who viewed three or more webpages within the visit were labelled as ‘survived’. Visitors who viewed two or less webpages before leaving the website were labelled as ‘did not survive’. Thereby, the ‘did not survive’ observations represented the visits that prematurely dropped off the website. Using the visitor’s physical and behavioral characteristics, as tracked by Google Analytics, the cox-proportional hazard and survival random forest models were employed to determine the hazards that influence survival. Visitor’s physical characteristics include the device used to access the website, geolocation at the time of the visit, number of previous visits, etc., whilst the behavioral characteristics include the landing page on website, level of engagement, whether entry into the website originated through an organic search or not. Whilst both models have identified similar features as being key hazards, the survival random forest model has been shown to out-perform on the non-linear features relative to the cox proportional hazard model and obtained a higher classification accuracy. During the validation process, the survival random forest model (63%) outperformed the cox model (58%) on classification accuracy. The features that were identified as hazardous indicated that some webpages needed further attention, the visitor’s level of engagement with the website (the degree of scrolling and clicks), the distance between a visitor’s location and the studied corporate’s location, the historic frequency of visiting the website, and if the website entry point was through an organic search. Whilst the study of drop-offs has been a commonly researched problem, this study details the investigation of key hazards through the use of survival models and compares the outcomes of a regression-based model to a machine learning survival model.
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science