Abstract: Today's largest data processing workloads are hosted in cloud data centers. Due to exponential data growth and the end of Moore's Law, these workloads have ballooned to the hyperscale level, where a single query encompasses billions to trillions of data items spread across hundreds to thousands of servers connected by the data center network. These massive scales fundamentally challenge the designs of both data processing systems and data center networks. My research rethinks the interactions between these two layers and seeks the optimal solutions for supporting data processing in data centers and evolving the cloud infrastructure.
In this talk, I will present a principled and cross-layer approach to building network-centric systems for hyperscale workloads. My approach covers data processing in both current networks and future networks, as well as how networks evolve. To demonstrate its efficiency, I will first discuss GraphRex, a system that combines classic database and systems techniques to push the performance of massive graph queries in current data centers. I will then introduce data processing in disaggregated data centers (DDCs), a promising new cloud proposal. I will detail TELEPORT, a system that allows data processing systems to unlock all DDC benefits. Finally, I will also show MimicNet, a system that facilitates network innovation at scale.
Bio: Qizhen Zhang is a Ph.D. candidate in the Department of Computer and Information Science at the University of Pennsylvania, advised by Vincent Liu and Boon Thau Loo. His dissertation research bridges cloud data processing systems and data center networks to address emerging challenges in hyperscale data processing. He is broadly interested in data management and computer systems and networking, and he researches across the data processing stack. His work appears at database and systems conferences such as SIGMOD, VLDB, and SIGCOMM.