Urbana Campus Research Calendar (OVCRI)

View Full Calendar

Fall 2020 IDS2 Seminar Series

Event Type
Seminar/Symposium
Sponsor
Professor Sanmi Koyejo, Department of Computer Science
Date
Dec 4, 2020   2:15 - 3:15 pm  
Speaker
Shipra Agrawal, Cyrus Derman Assistant Professor of the Department of Industrial Engineering and Operations Research, Columbia University.
Cost
Free
Registration
Registration
Contact
Peggy Wells
E-Mail
pwells@illinois.edu
Views
32
Originating Calendar
CSL SINE Group

UPDATED TIME:  2:15 p.m. TODAY, December 4, 2020

Zoom: 272 292 042 ; password: 035679

Meeting Link: https://illinois.zoom.us/j/272292042?pwd=WEFqNHpBekR6RVF1U1NFQkFyMm1CUT09

Title: Learning in structured MDPs with convex cost function: improved regret bounds for inventory management

Abstract: The stochastic inventory control problem under censored demands is a fundamental problem in revenue and supply chain management. A simple class of policies called ``base-stock policies'' is known to be optimal for this problem, and further, the convexity of long-run average-cost under such policies has been established. In this work, we present a learning algorithm for the stochastic inventory control problem under lost sales penalty and positive lead times, when the demand distribution is a priori unknown. Our main result is a regret bound of O(L\sqrt{T}+D) for the algorithm, where T is the time horizon, L is the fixed and known lead time, and D is an unknown parameter of the demand distribution described roughly as the number of time steps needed to generate enough demand for depleting one unit of inventory. Our results significantly improve the existing regret bounds for this problem. Notably, even though the state space of the underlying Markov Decision Process (MDP) in this problem is continuous and L-dimensional, our regret bounds depend linearly on L. Our techniques utilize convexity of the long-run average cost and a newly derived bound on `bias' of base-stock policies, to establish an almost blackbox connection between the problem of learning and optimization in such MDPs and stochastic convex bandit optimization. The techniques presented here may be of independent interest for other settings that involve large structured MDPs but with convex cost functions. This talk is based on joint work with Randy Jia.

Bio: Shipra Agrawal is Cyrus Derman Assistant Professor of the Department of Industrial Engineering and Operations Research. She is also affiliated with the Department of Computer Science and the Data Science Institute, at Columbia University. Her research spans optimization, machine learning, and sequential decision making under uncertainty, particularly in the areas of multi-armed bandits, online learning, and reinforcement learning. Shipra serves as an associate editor for Management Science, Mathematics of Operations Research, and the INFORMS Journal on computing. Her research is supported by a Google Faculty Research Award, an Amazon research award, and an NSF CAREER Award.

link for robots only