Affiliation:
1. H. Milton Stewart School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 30332;
2. Department of Mathematics, Iowa State University, Ames, Iowa 50011
Abstract
We consider load balancing in large-scale heterogeneous server systems in the presence of data locality that imposes constraints on which tasks can be assigned to which servers. The constraints are naturally captured by a bipartite graph between the servers and the dispatchers handling assignments of various arrival flows. When a task arrives, the corresponding dispatcher assigns it to a server with the shortest queue among [Formula: see text] randomly selected servers obeying these constraints. Server processing speeds are heterogeneous, and they depend on the server type. For a broad class of bipartite graphs, we characterize the limit of the appropriately scaled occupancy process, both on the process level and in steady state, as the system size becomes large. Using such a characterization, we show that imposing data locality constraints can significantly improve the performance of heterogeneous systems. This is in stark contrast to either heterogeneous servers in a full flexible system or data locality constraints in systems with homogeneous servers, both of which have been observed to degrade the system performance. Extensive numerical experiments corroborate the theoretical results. Funding: This work was partially supported by the National Science Foundation [CCF. 07/2021–06/2024].
Publisher
Institute for Operations Research and the Management Sciences (INFORMS)
Subject
Management Science and Operations Research,Statistics, Probability and Uncertainty,Modeling and Simulation,Statistics and Probability