Abstract: Convergence theory for reinforcement learning is sparse: barely existent for Q-learning outside of the special case of Watkins, and the situation is even worse for RL with nonlinear function approximation. This is unfortunate, given the current interest in neural networks. What’s more, every user of RL knows that it can be insanely slow and unreliable. The talk will begin with explanations for slow convergence based on a combination of statistical reasoning and nonlinear dynamical systems theory. The special sauce in this lecture is an approach to universal stability of RL based on generalizations of Zap Q-learning.
Apologies in advance: there will be no finite-n bounds bounds in this lecture — all asymptotic. We will see why there is little hope for useful finite-n bounds when we consider algorithms with “noise” that has memory (such as in standard Markovian settings).
REFERENCES:
- S. Chen, A. M. Devraj, F. Lu, A. Busic, and S. Meyn. Zap Q-Learning with nonlinear function approximation. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, and arXiv e-prints 1910.05405, volume 33, pages 16879–16890. Curran Associates, Inc., 2020.
- A. M. Devraj, A. Busic and S. Meyn. Fundamental design principles for reinforcement learning algorithms. In K. G. Vamvoudakis, Y. Wan, F. L. Lewis, and D. Cansever, editors, Handbook on Reinforcement Learning and Control. Springer, 2021.
- S. Meyn. Control Systems and Reinforcement Learning. Cambridge University Press, 2021 (draft available upon request)