Ph.D. Candidate, Massachusetts Institute of Technology
ECE Faculty Candidate Seminar
Wednesday, April 5, 2023, 9:00-10:00am
B02 CSL and Online via Zoom
Title: Networked Machine Learning Systems: Edge, Cloud, and Applications
Abstract: Machine learning (ML) requires significant computing power to achieve state-of-the-art accuracy, often beyond the capacity of a single computing device. I will talk about networked ML systems and their applications, their unique challenges in providing efficient access to larger computing resources, and systems we have developed for two settings: real-time neural network inference at the edge and large-scale training in the cloud. These settings are representative of the challenges at the two ends of the spectrum in device connectivity, power, and heterogeneity.
An essential theme in our approach is a joint optimization of the application, compute, and network stack for ML workloads. For cloud training, I will present SiP-ML, an end-to-end optical networking system designed explicitly for ML workloads. SiP-ML jointly optimizes the optical interconnect topology and the distributed training strategy, enabling thousands of GPUs to efficiently leverage multi-terabits-per-seconds of communication bandwidth. For edge-based inference applications, I will introduce adaptive model streaming (AMS), a new approach for continuously adapting lightweight models deployed at the edge remotely over the network. AMS dynamically updates the deployed models multiple times a minute to specialize them for the particular data distribution encountered at each device. AMS enables highly accurate video analytics while maintaining real-time latency (< 30ms) on a typical mobile phone, and consuming bandwidth less than a FaceTime audio call. Finally, I will present RECL, a system that improves AMS scalability by learning to reuse adapted models across streams, reducing adaptation costs and compute time. RECL automatically develops a better model reuse mechanism over time, becoming more resource-efficient with the system scale.
Mehrdad Khani is a PhD candidate at MIT working with Prof. Mohammad Alizadeh. His research interests are broadly in computer systems, applied machine learning, and networks, with a focus on networked machine learning systems in recent years. A shared theme in his work is to identify the practical system characteristics across various system abstraction layers, from physical to the application, and build solutions that holistically optimize the end-to-end system performance. Pursuing strong practicality, he has collaborated with and published in various communities, from computer vision and machine learning to networking and systems to wireless communication and signal processing. Before joining MIT, he received his B.Sc. degree in Electrical Engineering from Sharif University of Technology, where he also obtained a B.Sc. in Computer Science.