Title: Scalable feature importance estimation, Shapley values and feature selection
Abstract: Interpreting so-called "black-box" machine learning algorithms has been a problem of great interest recently. A concrete approach for interpretation that is agnostic to the algorithm/method choice is estimating feature importance and performing feature selection. This talk is divided into two parts: the first part focuses on providing a scalable approach for feature importance estimation through fine-tuning of gradient-based approaches to neural networks and gradient boosted decision trees. Theoretical guarantees are provided by exploiting the neural tangent kernel and analysis of early stopping for kernel methods. Brief connections to fine-tuning pre-trained large language models are also discussed. The second part focuses on reliable feature selection by adapting Shapley value estimates to determine whether features are statistically significant. By exploiting connections to DAG-learning, I introduce a MinShap algorithm which empirically is significantly more reliable than state-of-the-art methods such as leave-one-covariate (LOCO) and generalized covariance measure (GCM).
Bio: Garvesh Raskutti is a Professor in the Department of Statistics at UW Madison. His research interests include interpretable machine learning, high-dimensional statistics and graphical models.