*Presentation will be recorded.
Machine learning is playing increasingly important roles in decision making, with key applications ranging from dynamic pricing and recommendation systems to personalized medicine and clinical trials. While supervised machine learning traditionally excels at making predictions based on i.i.d. offline data, many modern decision-making tasks require making sequential decisions based on data collected online. Such discrepancy gives rise to important challenges of bridging offline supervised learning and online interactive learning to unlock the full potential of data-driven decision making.
In the main part of this talk, we consider the challenge of reducing difficult online decision-making problems to well-understood offline supervised learning problems. Focusing on contextual bandits, a core class of online decision-making problems, we present the first optimal and efficient reduction from contextual bandits to offline regression. A remarkable consequence of our results is that advances in offline regression immediately translate to contextual bandits, statistically and computationally. We illustrate the advantages of our results through new guarantees in complex operational environments and experiments on real-world datasets. We also discuss the extensions of our results to more challenging setups, including reinforcement learning in large state spaces.
After the main part, I will provide an overview of my additional work and broader research agenda on bridging online and offline learning towards improved data-driven decision making. I will highlight the importance of problem structures and discuss the exciting opportunities for the operations research community.
Yunzong Xu is a fifth-year PhD student in the Institute for Data, Systems, and Society at MIT, advised by Prof. David Simchi-Levi. His research lies at the intersection of machine learning, operations research, and business analytics. His current research interests include data-driven decision making, online and reinforcement learning, econometrics and causal inference, with applications to revenue management and healthcare. Over the course of his PhD, his research has been recognized by seven best paper awards (including finalists) from INFORMS George Nicholson Paper Competition, Applied Probability Society, Data Mining Section, Revenue Management and Pricing Section, Service Science Section, and other competitions/organizations. His industrial experience includes an internship at Microsoft Research on reinforcement learning, as well as an ongoing research collaboration with IBM and Boston Scientific on healthcare inventory management. Prior to joining MIT, he received his dual bachelor's degrees in information systems and mathematics from Tsinghua University in 2018.