Statistics Seminar - Cong Ma, University of Chicago: "Learning with Few Updates: Batched Contextual Bandits"
- Event Type
- Ceremony/Service
- Sponsor
- Department of Statistics
- Location
- 106B1 Engineering Hall
- Date
- Nov 6, 2025 3:30 pm
- Views
- 123
- Originating Calendar
- Department of Statistics Event Calendar
Title: Learning with Few Updates: Batched Contextual Bandits
Abstract: Sequential decision-making is central to modern statistics, with applications ranging from clinical trials to online recommendation systems. Classical theory assumes that policies can be updated at every step, but in many modern experiments decisions can only be revised at a few discrete times, leading to batching constraints. Such limits on adaptivity inevitably affect statistical performance, raising a central question: how much efficiency is lost, and how many updates are needed for optimal learning?
I will address this question in the setting of contextual bandits with smooth reward functions. I will first present a success story: when the margin parameter is known, only $\log\log T$ batches are needed to match the minimax regret rates of the fully online setting—showing that very limited adaptivity is enough for optimal learning. I will then turn to the more subtle case where the margin parameter is unknown. In the online regime, adaptation comes at no cost, but batching introduces a genuine barrier: there is a provable statistical price to be paid. I will describe recent results that sharply characterize this price under adaptive batch schedules.
