Speakers

Calendar home
Calendar search
Share on Facebook
Tweet
Email
add to calendar
contact
add an event

Title or description keywords

Start date

End date

Event type

Ceremony/Service
Community Service
Conference/Workshop
Exhibition
Festival/Celebration
Film Screening
Health/Fitness
Informational
Lecture
Meeting
Other
Performance
Professional Development
Reception/Open House
Religious/Cultural
Seminar/Symposium
Social/Informal Event
Sporting Event
Webinar
Welcome Week

SpeakersDAIS Seminar: Dr. Quanquan Gu, "Unleashing the power of variance reduction for training large models."

DAIS Seminar: Dr. Quanquan Gu, "Unleashing the power of variance reduction for training large models."

Sep 17, 2025 11:00 am - 12:30 pm

4124 Siebel Center

Seminar/Symposium

Sponsor: Prof. Jiawei Han and Prof. Chengxiang Zhai
Speaker: Dr. Quanquan Gu
Contact: Allison Mette
E-Mail: agk@illinois.edu
Phone: 217-300-0256
Views: 172
Originating Calendar: Siebel School Speakers Calendar; Abstract: Training deep neural networks and large language models demands efficient and scalable optimizers. Adaptive gradient algorithms like Adam, AdamW, and their variants have been central to this task. Despite the development of numerous variance reduction algorithms in the past decade aimed at accelerating stochastic optimization in both convex and nonconvex settings, variance reduction has not found widespread success in training deep neural networks or large language models. Consequently, it has remained a less favored approach in modern AI. In this talk, I will introduce a unified optimization framework, MARS (Make vAriance Reduction Shine), which reconciles preconditioned gradient methods with variance reduction via a scaled stochastic recursive momentum technique. Within this framework, I will introduce three instances of MARS that leverage preconditioned gradient updates based on AdamW, Lion, and Shampoo, respectively. In addition, I will draw a connection between our algorithms and existing optimizers. Experimental results on training GPT-2 models indicate that MARS consistently outperforms AdamW by a large margin.

Bio: Quanquan Gu is an Associate Professor of Computer Science at UCLA. His research is in artificial intelligence and machine learning, with a focus on nonconvex optimization, deep learning, reinforcement learning, large language models, and deep generative models. He received his Ph.D. degree in Computer Science from the University of Illinois at Urbana-Champaign in 2014. He is a recipient of the Sloan Research Fellowship, NSF CAREER Award, Simons Berkeley Research Fellowship among other industrial research awards.

link for robots only

Additional links

Copyright
Privacy Policy