The training data that modern machine learning models ingest has a major impact on these models’ performance (as well as failures). Yet, this impact tends to be neither fully appreciated nor understood at a fine-grained enough level.
In this talk, we will discuss some of the key ways in which training data influences not only what but also how models “learn” as well as tools to dissect this influence. In particular, we will present a new framework---called datamodeling---for directly casting predictions as functions of training data and the corresponding model class. This framework enables us to perform a range of model class-driven data analysis, including discovery of subpopulations, quantifying brittleness of model predictions, and diagnosing other shortcomings of the training set.
Aleksander Madry is the Cadence Design Systems Professor of Computing at MIT, leads the MIT Center for Deployable Machine Learning as well as is a faculty co-lead for the MIT AI Policy Forum. His research interests span algorithms, continuous optimization, and understanding machine learning from a robustness and deployability perspectives.
Aleksander's work has been recognized with a number of awards, including an NSF CAREER Award, an Alfred P. Sloan Research Fellowship, an ACM Doctoral Dissertation Award Honorable Mention, and Presburger Award. He received his PhD from MIT in 2011 and, prior to joining the MIT faculty, he spent time at Microsoft Research New England and on the faculty of EPFL.
Part of the Illinois Computer Science Speakers Series. Faculty Host: Bo Li
Join us in person in 2405 Siebel Center for Computer Science, 201 N. Goodwin Ave. or with Zoom meeting link (meeting ID:863 0608 1124, password: csillinois).