Date: Monday, October 10
Time: 4:30pm - 5:30pm
Location: The seminar is hybrid. You can attend in-person at Room 2124 (Siebel), or you can join via Zoom (https://illinois.zoom.us/j/89400978467?pwd=NjV3ZFQrQ1JidTNyS0ZNUEVOcEtpUT09)
Speaker(s): Mangpo Phothilimthana (External Speaker)
Title: Datacenter Scale Autotuning for ML Workloads
Abstract:
Search-based techniques have been demonstrated effective in solving complex optimization problems that arise in domain-specific compilers for machine learning (ML). Unfortunately, deploying such techniques in production compilers is impeded by several limitations. In this talk, I will present an autotuner for production ML compilers that can tune both graph-level and subgraph-level optimizations at multiple compilation stages. We demonstrate how to incorporate machine learning techniques such as a learned cost model to reduce autotuning time. Our learned cost model has high accuracy and outperforms a heavily optimized analytical performance model. In an evaluation across 150 ML training and inference models on Tensor Processing Units (TPUs), the autotuner offers up to 2.4x and an average 5% runtime speedup over the heavily optimized XLA compiler.
In the second part of the talk, I will outline how we deploy the XLA autotuner at datacenter scale to automatically tune the most heavily used production models in Google’s fleet every day. The deployed tile size and flag autotuners have been saving approximately 2% of fleetwide TPU compute time. I will also share some of the challenges we experienced from deploying the autotuner in production, including the accuracy of runtime estimation of a graph, numerical issues, and compiler bugs.
Bio:
Mangpo is a research scientist at Google Brain, where she leads Machine Learning for Machine Learning Compilers effort (one of Google Brain moonshots in 2020). She is also involved in various research projects that apply program languages techniques for machine learning. Her research interests include compilers, machine learning for systems, program synthesis, and energy-aware computing. Mangpo completed a PhD in Computer Science at UC Berkeley. Her previous research focuses on synthesis-aided compilation and programming models for emerging architectures, ranging from an ultra-low-power processor to a programmable network card. She was a recipient of Microsoft Research PhD Fellowship and Qualcomm Innovation Fellowship.