New Challenges for (Contextual) Multi-Armed Bandit: Fairness Objectives, Indirect Feedback, and Beyond
Abstract: While multi-armed bandit is a decades-old and heavily studied problem, many new challenges have emerged in modern bandit applications. In this talk, I will explore several such challenges and present solutions from recent work. In particular, I will start with a multi-agent variant of the classic multi-armed bandit setup and discuss how to efficiently learn the best policy when the objective is to ensure fairness among agents. Then, I will shift focus to some contextual bandit variants that are motivated by real-life applications such as training language models where the feedback to the learner is indirect(that is, not simply the reward of the selected action), and discuss some solutions leveraging recent advances for handling general function approximation.
Biography: Mengxiao Zhang is an assistant professor at the Business Analytics Department at U of Iowa. He obtained the PhD degree in Computer Science at University of Southern California advised by Prof. Haipeng Luo. His research is about designing robust and adaptive machine learning algorithms with strong theoretical guarantees, with a focus on general sequential learning problems, including online learning, bandit problems, game theory and various operational research and revenue management applications. He has published papers in top-tier machine learning conferences and learning theory conferences, including oral presentations. He has interned with Microsoft Research and Amazon, and received a B.S. in the School of EECS from Peking University.