In spite of the dramatic success of deep learning over the past decade, there is limited
understanding of why things are working. The talk will shed some light on two facets of the mystery:
optimization and generalization. For optimization, we will discuss the low-dimensional geometry
of gradients in deep learning, active vs. lazy learning, and some implications. For generalization,
we discuss how smoothed analysis, which avoids worst-case analysis by adding a bit of noise
to problem components, may be an effective approach to understanding generalization in deep
networks. We will discuss preliminary empirical and theoretical results on both the facets, and
discuss directions for future work.
Arindam Banerjee is a Founder Professor at the Department of Computer Science, University
of Illinois Urbana-Champaign. His research interests are in machine learning and data mining,
especially on problems involving geometry and randomness. His current research focuses on
computational and statistical aspects of deep learning, spatial and temporal data analysis, and
sequential decision making problems. His work also focuses on applications in complex real-world
problems in different areas including climate science, ecology, recommendation systems, and
finance, among others. He has won several awards, including the NSF CAREER award (2010), the
IBM Faculty Award (2013), and six best paper awards in top-tier venues.