Link to Talk Video: https://mediaspace.illinois.edu/media/t/1_f7akxvbi
Abstract: Suppose you are asked to provide feedback of a bunch of items (say movies, pens, perfumes, apps, etc.) you have used on a platform. Would you be more comfortable eliciting your feedback as a rating on the items in a scale of 0-10 or rather just prefer ranking the items in order of your preferences? Or or even better, may be you can select the favorite one from the set!
Customer statistics collected in several real-world systems have reflected that users often prefer eliciting their liking for a given pair of items, say (A,B), in terms of relative queries like: "Do you prefer Item A over B?", rather than their absolute counterparts: ``How much do you score items A and B on a scale of [0-10]?".
Drawing inspirations and with the hope of building cost-effective user-friendly systems, this led to the famous formulation of Dueling Bandits (DB) [Yue&Joachims'11], which was followed by a huge surge of interest in the online learning from pairwise preferences in the Bandits-Online ML community. While the setting could be extremely useful in diverse fields of real-life applications, starting from recommendation systems, to crowd-sourcing platforms, to search-engine optimization, online retail, or even more complex tasks like designing multi-player games or training chat-bots/ humanoid robots, just to name a few, unfortunately, the existing DB techniques were predominantly limited only to the simpler settings of only pairwise-preferences, finite decision spaces, and stochastic environments, which are clearly unrealistic from a practical standpoint.
In this talk, we will start with the problem objective(s) for learning from preference feedback and get familiarize ourself with some of the breakthrough results in the recent literature. Following this will dive deep into a practical framework of contextual dueling bandits (C-DB) where the goal of the learner is to make customized predictions based on the users' need (or user context). Despite the practicability of the setup, there has not been any efficient and optimal algorithm for this framework owning to the challenge of very limited information from preference feedback.
More formally, we will focus on the C-DB regret minimization problem under realizability, where the feedback is generated by a pairwise preference matrix that is well-specified by a given function class. We will then understand a new algorithmic approach that can efficiently achieve the optimal performance for this problem. In fact, we will consider a new notion of `best response' regret, which is a strictly stronger performance measure than those considered in the prior works, and yet show how to achieve the optimal regret guarantee in a computationally efficient manner (assuming access to an online oracle for square loss regression over F). This also resolves an open problem of Dudík et al. [COLT, 2015] on oracle efficient, regret-optimal algorithms for C-DB. We will conclude the talk with a brief overview of further potential scopes of bringing preference-based learning into real-world systems, consequently the interesting open problems.
[Major part of the talk on the C-DB setup is based on a joint work with Akshay Krishnamurthy, "Efficient and Optimal Algorithms for Contextual Dueling Bandits under Realizability", ALT 2022.]
Bio: Aadirupa is visiting faculty at TTI Chicago. Before this, she was a postdoctoral researcher at Microsoft Research New York City. She obtained her Ph.D. from the Department of Computer Science, Indian Institute of Science, Bangalore, advised by Aditya Gopalan and Chiranjib Bhattacharyya. Aadirupa was an intern at Microsoft Research, Bangalore, Inria, Paris, and Google AI, Mountain View.
Her research interests include Bandits, Reinforcement Learning, Optimization, Learning theory, Algorithms. Off late, she is also very interested in working on problems in the intersection of ML and Game theory, Algorithmic fairness, and Privacy.
Part of the Illinois Computer Science Speakers Series. Faculty Host: Nan Jiang