Large-scale parallel-processing systems, such as data centers and cloud networks, are the factories of today's digital world. Resource management policies, and particularly load balancing algorithms (LBAs), play a pivotal role in these systems in meeting highly stringent delay requirements while only involving low overhead in large-scale deployments. An emerging challenge arising in designing LBAs for modern data centers is the data locality constraint that governs which tasks can be assigned to which servers. The constraints are naturally modeled as a bipartite graph between the servers and various task types. Existing heuristics for large-scale LBAs predominantly rely on the validity of the mean-field approximation. However, the non-exchangeability among servers induced by the data locality breaks the mean-field approximation framework. Consequently, the empirical behavior of these systems differs drastically from the existing wisdom. In this talk, we will discuss some recent foundational progress that we made in understanding LBAs in the presence of data locality. In particular, we will talk about how to design resource-efficient, asymptotically optimal data locality constraints and how the system behavior changes fundamentally, depending on whether the above bipartite graph is an expander, a spatial graph, or is inhomogeneous in nature.
Based on joint works with Daan Rutten, Zhisheng Zhao, and Ruoyu Wu.
Debankur Mukherjee is an Assistant Professor in the H. Milton Stewart School of Industrial and Systems Engineering at the Georgia Institute of Technology. Before joining Georgia Tech in 2019, he was a Prager assistant professor for a year in the Division of Applied Mathematics at Brown University. Debankur got his Ph.D. in Stochastic Operations Research from the Eindhoven University of Technology in the Netherlands. Debankur’s research spans the area of applied probability, at the interface of stochastic processes and computer science, with applications to performance analysis, online algorithms, and machine learning. His primary focus is to develop a foundational understanding of the challenges that arise in large-scale systems, such as data centers and cloud networks. His work was a finalist in the INFORMS JFIG paper competition in 2022 and received the Best Student Paper Award at ACM SIGMETRICS 2018. His research has been funded by the NSF and he is currently serving on the editorial boards of Stochastic Systems and QUESTA.