Computer Science Speaker Series Master Calendar

View Full Calendar

COLLOQUIUM: Simon Du, "Pre-Training Data Selection for Representation Learning"

Event Type
Illinois Computer Science
HYBRID: 2405 Siebel Center for Computer Science or online
wifi event
Feb 26, 2024   3:30 pm  
Originating Calendar
Computer Science Colloquium Series


Refreshments Provided.

Pre-training datasets are a critical component in recent breakthroughs in artificial intelligence. However, their design has not received the same level of research attention as model architectures or training algorithms. In this presentation, I will discuss our recent work on pre-training data selection for representation learning in the contexts of multi-modal contrastive learning and multi-task representation learning. For multi-modal contrastive learning, we propose a new notion, the Variance Alignment Score (VAS). We demonstrate that by maximizing the VAS as a data selection strategy, we can achieve superior performance on dataset selection benchmarks. For multi-task representation learning, we explore how to select the most relevant pre-training tasks for a target downstream task. We introduce a metric to characterize task relevance and design a new method for actively selecting the most pertinent tasks.

Simon S. Du is an assistant professor in the Paul G. Allen School of Computer Science & Engineering at the University of Washington. His research interests are broadly in machine learning, such as deep learning, representation learning, and reinforcement learning. Prior to starting as faculty, he was a postdoc at the Institute for Advanced Study. He completed his Ph.D. in Machine Learning at Carnegie Mellon University. Simon's research has been recognized by a Samsung AI Researcher of the Year Award, an NSF CAREER award, an Intel Rising Star Faculty Award, an Nvidia Pioneer Award, a AAAI New Faculty Highlights, a Distinguished Dissertation Award honorable mention from CMU, among others.

Part of the Illinois Computer Science Speakers Series. Faculty Host: Hanghang Tong

Meeting ID: 860 6491 0025 
Passcode: csillinois

If accommodation is required, please email <> or <>. Someone from our staff will contact you to discuss your specific needs



link for robots only