Affiliation:
1. Department of Computer and Control Engineering, Rzeszow University of Technology, Powstancow Warszawy 12, 35-959 Rzeszow, Poland
Abstract
As web systems based on containerization increasingly attract research interest, the need for effective analytical methods has heightened, with an emphasis on efficiency and cost reduction. Web client simulation tools have been utilized to further this aim. While applying machine learning (ML) methods for anomaly detection in requests is prevalent, predicting patterns in web datasets is still a complex task. Prior approaches incorporating elements such as URLs, content from web pages, and auxiliary features have not provided any satisfying results. Moreover, such methods have not significantly improved the understanding of client behavior and the variety of request types. To overcome these shortcomings, this study introduces an incremental approach to request categorization. This research involves an in-depth examination of various established classification techniques, assessing their performance on a selected dataset to determine the most effective model for classification tasks. The utilized dataset comprises 8 million distinct records, each defined by performance metrics. Upon conducting meticulous training and testing of multiple algorithms from the CART family, Extreme Gradient Boosting was deemed to be the best-performing model for classification tasks. This model outperforms prediction accuracy, even for unrecognized requests, reaching a remarkable accuracy of 97% across diverse datasets. These results underline the exceptional performance of Extreme Gradient Boosting against other ML techniques, providing substantial insights for efficient request categorization in web-based systems.
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering