Speakers

View Full Calendar

COLLOQUIUM: Xin Bing, "Learning Large Softmax Mixtures with Warm Start EM"

Event Type
Seminar/Symposium
Sponsor
Siebel School of Computing and Data Science
Location
HYBRID: 2405 Siebel Center for Computer Science or online
Virtual
wifi event
Date
Apr 7, 2025   3:30 pm  
Views
13
Originating Calendar
Siebel School Colloquium Series

Zoom: https://illinois.zoom.us/j/84569393374?pwd=4IOc8iwGp5u9XTDfJwXNa2AZNgHoYw.1

Refreshments Provided.

Abstract: 
Mixed multinomial logits are discrete mixtures introduced several decades ago to model the probability of choosing an attribute from p possible candidates, in heterogeneous populations. The model has recently attracted attention in the AI literature, under the name softmax mixtures, where it is routinely used in the final layer of a neural network to map a large number p of vectors in ℝL to a probability vector. Despite its wide applicability and empirical success, statistically optimal estimators of the mixture parameters, obtained via algorithms whose running time scales polynomially in L, are not known. This paper provides a solution to this problem for contemporary applications, such as large language models, in which the mixture has a large number p of support points, and the size N of the sample observed from the mixture is also large. Our proposed estimator combines two classical estimators, obtained respectively via a method of moments (MoM) and the expectation-minimization (EM) algorithm. Although both estimator types have been studied, from a theoretical perspective, for Gaussian mixtures, no similar results exist for softmax mixtures for either procedure. We develop a new MoM parameter estimator based on latent moment estimation that is tailored to our model, and provide the first theoretical analysis for a MoM-based procedure in softmax mixtures. Although consistent, MoM for softmax mixtures can exhibit poor numerical performance, as observed other mixture models. Nevertheless, as MoM is provably in a neighborhood of the target, it can be used as warm start for any iterative algorithm. We study in detail the EM algorithm, and provide its first theoretical analysis for softmax mixtures. Our final proposal for parameter estimation is the EM algorithm with a MoM warm start.

Bio:
Xin has been an Assistant Professor in the Department of Statistical Sciences at the University of Toronto since March, 2022. He finished his Ph.D. in Statistics in June, 2021 from the Department of Statistics and Data Science at Cornell University, advised jointly by Florentina Bunea and Marten Wegkamp. Xin’s research interest generally lies in developing new methodology with theoretical guarantees to tackle modern statistical problems such as high-dimensional statistics, mixture models, low-rank matrix estimation, topic models, multivariate analysis, non-parametric regression, minimax estimation, statistical and computational trade-offs. He is also interested in applications of statistical methods to genetics, neuroscience, immunology and other areas.

 

Part of the Siebel School Speakers Series. Faculty Host: Han Zhao


Meeting ID: 845 6939 3374
Passcode: csillinois


If accommodation is required, please email <erink@illinois.edu> or <communications@cs.illinois.edu>. Someone from our staff will contact you to discuss your specific needs


link for robots only