Abstract
Data-as-a-Service (DaaS) is a branch of cloud computing that supports “querying the Web”. Due to its ultrahigh scale, it is essential to establish rules when defining resources’ costs and guidelines for infrastructure investments. Those decisions should prioritize minimizing the incidence of agreement breaches that compromise the performance of cloud services and optimize resources’ usage and services’ cost. This article aims to address the cost problem of DaaS by developing a model that optimizes the costs of querying distributed data sources over virtual machines spread across multisite data centers. We have designed and analyzed a cost model for DaaS, besides implementing a scheduling system to perform a cost-based VM assignment. To validate our model, we have studied and characterized a real-world DaaS system’s network and processing workloads. On average, our cost-based scheduling performs at least twice as well as the traditional round-robin approach. Our model also supports load balancing and infrastructure scalability when combined with an adaptive cost scheme that prioritizes VM allocation within the underutilized data centers and avoids sending VMs to data centers in the eminence of becoming over-utilized.
Subject
General Computer Science,Theoretical Computer Science