Siebel School Speakers Calendar

View Full Calendar

iDS^2 Mini-Workshop on Asymptotics and Non-Asymptotics in Control and Reinforcement Learning: Stochastic Control Problems, and How You Can Solve Yours

Event Type

Conference/Workshop

Sponsor

Coordinated Science Laboratory

Virtual

Date

Apr 16, 2021 12:30 - 3:00 pm with a break from 1:30 pm to 2:00 pm

Speaker

Sean Meyn (University of Florida)

Registration

Contact

Maxim Raginsky

E-Mail

maxim@illinois.edu

Views

Abstract: Convergence theory for reinforcement learning is sparse: barely existent for Q-learning outside of the special case of Watkins, and the situation is even worse for RL with nonlinear function approximation. This is unfortunate, given the current interest in neural networks. What’s more, every user of RL knows that it can be insanely slow and unreliable. The talk will begin with explanations for slow convergence based on a combination of statistical reasoning and nonlinear dynamical systems theory. The special sauce in this lecture is an approach to universal stability of RL based on generalizations of Zap Q-learning.

Apologies in advance: there will be no finite-n bounds bounds in this lecture — all asymptotic. We will see why there is little hope for useful finite-n bounds when we consider algorithms with “noise” that has memory (such as in standard Markovian settings).

REFERENCES:

S. Chen, A. M. Devraj, F. Lu, A. Busic, and S. Meyn. Zap Q-Learning with nonlinear function approximation. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, and arXiv e-prints 1910.05405, volume 33, pages 16879–16890. Curran Associates, Inc., 2020.
A. M. Devraj, A. Busic and S. Meyn. Fundamental design principles for reinforcement learning algorithms. In K. G. Vamvoudakis, Y. Wan, F. L. Lewis, and D. Cansever, editors, Handbook on Reinforcement Learning and Control. Springer, 2021.
S. Meyn. Control Systems and Reinforcement Learning. Cambridge University Press, 2021 (draft available upon request)

link for robots only