Computer Science Speakers Series

Back to Listing

COLLOQUIUM: Arindam Banerjee, "SGD for Deep Learning: Empirical Geometry, Stability, and Smoothed Analysis"

Event Type
Seminar/Symposium
Sponsor
Illinois Computer Science
Location
https://mediaspace.illinois.edu/media/t/1_h3pdhutn
Virtual
wifi event
Date
Mar 3, 2021   3:30 - 4:30 pm  
Cost
Free
Contact
Candice Steidinger
E-Mail
steidin2@illinois.edu
Views
117
Originating Calendar
Computer Science Speakers Calendar

Recording available to view at: https://mediaspace.illinois.edu/media/t/1_h3pdhutn 

 

Abstract:

While the past decade has seen unprecedented empirical success of deep learning models, the generalization behavior of such models remain shrouded in mystery. We will start by briefly reviewing recent work indicating that generalization may potentially be explained by properties of the optimization algorithm used for learning rather than expressive power of the function class. In this talk, we will focus on Stochastic Gradient Descent (SGD)-type algorithms and discuss optimization and generalization properties of such algorithms. We will first present a set of empirical results illustrating the high-dimensional geometry of gradients and Hessians for deep models trained with SGD. Then, we discuss generalization of SGD-type algorithms based on stability, where mild changes in data do not lead to large changes in the learned model.  In particular, we present stability bounds based on a smoothed analysis of SGD, i.e., by adding Gaussian noise to the stochastic gradients, and discuss tradeoffs in optimization and generalization illustrated by such bounds. Further, we illustrate that such noisy SGD methods have essentially the same empirical performance as SGD while being much easier to analyze in theory.

 

Bio:

Arindam Banerjee is a Founder Professor at the Department of Computer Science, University of Illinois Urbana-Champaign. His research interests are in machine learning and data mining, especially on problems involving geometry and randomness. His current research focuses on computational and statistical aspects of deep learning, spatial and temporal data analysis, and sequential decision making problems. His work also focuses on applications in complex real-world problems in different areas including climate science, ecology, recommendation systems, and finance, among others. He has won several awards, including the NSF CAREER award (2010), the IBM Faculty Award (2013), and six best paper awards in top-tier venues.

 

Part of the Illinois Computer Science Speakers Series. Faculty Host: Hanghang Tong.

link for robots only