Zoom: https://illinois.zoom.us/j/86064910025?pwd=OUZSYkx1alpsbkl2Kys5MnZDZTljdz09
Refreshments Provided.
Abstract:
Pre-training datasets are a critical component in recent breakthroughs in artificial intelligence. However, their design has not received the same level of research attention as model architectures or training algorithms. In this presentation, I will discuss our recent work on pre-training data selection for representation learning in the contexts of multi-modal contrastive learning and multi-task representation learning. For multi-modal contrastive learning, we propose a new notion, the Variance Alignment Score (VAS). We demonstrate that by maximizing the VAS as a data selection strategy, we can achieve superior performance on dataset selection benchmarks. For multi-task representation learning, we explore how to select the most relevant pre-training tasks for a target downstream task. We introduce a metric to characterize task relevance and design a new method for actively selecting the most pertinent tasks.
Bio:
Simon S. Du is an assistant professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. His research interests are broadly in machine learning, such as deep learning, representation learning, and reinforcement learning. Prior to starting as faculty, he was a postdoc at the Institute for Advanced Study. He completed his Ph.D. in Machine Learning at Carnegie Mellon University. Simon's research has been recognized by a Samsung AI Researcher of the Year Award, an NSF CAREER award, an Intel Rising Star Faculty Award, an Nvidia Pioneer Award, a AAAI New Faculty Highlights, a Distinguished Dissertation Award honorable mention from CMU, among others.
Part of the Illinois Computer Science Speakers Series. Faculty Host: Hanghang Tong
Meeting ID: 860 6491 0025
Passcode: csillinois
If accommodation is required, please email <erink@illinois.edu> or <communications@cs.illinois.edu>. Someone from our staff will contact you to discuss your specific needs