General Events

View Full Calendar

Baharan Mirzasoleiman Electrical and Computer Engineering Seminar

Event Type
Seminar/Symposium
Sponsor
Electrical and Computer Engineering
Location
B02 CSL Auditorum & Zoom
Date
Mar 24, 2025   10:00 - 11:00 am  
Speaker
Dr. Baharan Mirzasoleiman, University of California, Los Angeles
Contact
Angie Ellis
E-Mail
amellis@illinois.edu
Phone
217-300-1910
Views
17
Originating Calendar
Illinois ECE Calendar

Electrical and Computer Engineering Seminar

Baharan Mirzasoleiman

Assistant Professor, University of California, Los Angeles

Monday, March 24, 2025, 10:00-11:00 am

B02 CSL Auditorium or Online via Zoom

Title: Data-efficient Training of Foundation Machine Learning Models 

Abstract: Large datasets have been crucial to the success of foundation machine learning models. However, training on massive data has two major limitations. First, it is contingent on exceptionally large and expensive computational resources, and incurs a substantial cost due to the significant energy consumption. Second, due to the highly imbalanced and noisy nature of real-world datasets, training on the entire data does not result in optimal performance.

In this talk, I will argue that we can address the above limitations by developing techniques that can identify and extract the representative subsets from massive datasets. Training on representative subsets not only reduces the substantial costs of learning from big data, but also improves their accuracy and robustness. I will present two theoretically-rigorous approaches to find smaller subsets of examples that improve the performance and efficiency of training foundation models, such as Vision-Language Models (VLMs) and Large Language Models (LLMs). First, I will discuss how we can formulate an optimization problem to find smaller subsets of large image-text data to efficiently pretrain VLMs such as CLIP. Then, I'll discuss how we can formulate and extract smaller subsets of language data that considerably improve the performance and efficiency of fine-tuning and pretraining LLMs. I'll conclude each part by showing empirical results confirming the effectiveness of the above data selection strategies. 

Baharan Mirzasoleiman is an Assistant Professor in the Computer Science Department at UCLA, where she leads the BigML research group. Her research aims to address sustainability, reliability, and efficiency of machine learning. Before joining UCLA, Baharan was a postdoctoral research fellow in Computer Science at Stanford University. She received her Ph.D. in Computer Science from ETH Zurich, where she received an ETH medal for Outstanding Doctoral Thesis. She has received an NSF Career Award, an Okawa Research Award, a UCLA Hellman Fellows Award, and multiple Faculty Awards from Amazon, Optum AI, and Cisco. She was also named a Rising Star in EECS by MIT.

link for robots only