Siebel School Speaker Series Master Calendar

View Full Calendar

COLLOQUIUM: Simon Du, "Pre-Training Data Selection for Representation Learning"

Event Type

Seminar/Symposium

Sponsor

Illinois Computer Science

Location

HYBRID: 2405 Siebel Center for Computer Science or online

Virtual

Join online

Date

Feb 26, 2024 3:30 pm

Views

417

Originating Calendar

Siebel School Colloquium Series

Zoom: https://illinois.zoom.us/j/86064910025?pwd=OUZSYkx1alpsbkl2Kys5MnZDZTljdz09

Refreshments Provided.

Abstract:
Pre-training datasets are a critical component in recent breakthroughs in artificial intelligence. However, their design has not received the same level of research attention as model architectures or training algorithms. In this presentation, I will discuss our recent work on pre-training data selection for representation learning in the contexts of multi-modal contrastive learning and multi-task representation learning. For multi-modal contrastive learning, we propose a new notion, the Variance Alignment Score (VAS). We demonstrate that by maximizing the VAS as a data selection strategy, we can achieve superior performance on dataset selection benchmarks. For multi-task representation learning, we explore how to select the most relevant pre-training tasks for a target downstream task. We introduce a metric to characterize task relevance and design a new method for actively selecting the most pertinent tasks.

Bio:
Simon S. Du is an assistant professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. His research interests are broadly in machine learning, such as deep learning, representation learning, and reinforcement learning. Prior to starting as faculty, he was a postdoc at the Institute for Advanced Study. He completed his Ph.D. in Machine Learning at Carnegie Mellon University. Simon's research has been recognized by a Samsung AI Researcher of the Year Award, an NSF CAREER award, an Intel Rising Star Faculty Award, an Nvidia Pioneer Award, a AAAI New Faculty Highlights, a Distinguished Dissertation Award honorable mention from CMU, among others.

Part of the Illinois Computer Science Speakers Series. Faculty Host: Hanghang Tong

Meeting ID: 860 6491 0025
Passcode: csillinois

If accommodation is required, please email <erink@illinois.edu> or <communications@cs.illinois.edu>. Someone from our staff will contact you to discuss your specific needs

link for robots only