*Presentation will be recorded.
Abstract:
Data-driven methodology is a pillar of real-world decision-making. When applying statistical learning methods, puzzling phenomena have been arising in choosing estimators, tuning their parameters, and characterizing bias-variance trade-offs. There are various settings in which asymptotic and/or worst-case theory fails to provide the relevant guidance, so that a more refined approach, both non-asymptotic and instance-optimal, is required.
In this talk, I present some recent advances in optimal procedures for statistical decision-making. The bulk of this talk focuses on function approximation methods for policy evaluation in reinforcement learning. I describe a novel class of optimal and instance-dependent oracle inequalities for projected Bellman equations, as well as efficient Markovian stochastic approximation procedures achieving these guarantees. These instance-dependent results can guide parameter tuning for temporal difference learning. Drawing on this perspective, I then discuss instance-dependent optimal methods for off-policy estimation in contextual bandits, and illustrate how the bias-variance trade-off for decision-making can be substantially different from statistical learning.
Bio:
Wenlong Mou is a Ph.D. student at Department of EECS, UC Berkeley, advised by Martin Wainwright and Peter Bartlett. Prior to Berkeley, he received his B.Sc. degree in Computer Science from Peking University. Wenlong's research interests include statistics, machine learning theory, dynamic programming and optimization, and applied probability. He is particularly interested in designing optimal statistical methods that enable optimal data-driven decision making, powered by efficient computational algorithms.