A Markov decision process is called reversible if for every stationary Markov control strategy the resulting Markov chain is reversible. In view of their relevance in applications such as Markov Chain Monte Carlo algorithms, such reversible Markov decision process seem to be of interest to study in their own right.
We characterize all discrete time reversible Markov decision processes with finite state and action spaces, generalizing earlier work of Cogill and Peng. We discuss how the policy iteration algorithm to find an optimal stationary Markov control strategy is considerably simplified for reversible Markov decision processes. The potential to apply powerful tools from the study of reversible Markov chains, such as the Poisson gas of Markovian loops and the Gaussian free field, will also be discussed.
The speaker received his undergraduate degree from IIT Madras in 1980 and his graduate degrees in electrical engineering and mathematics from U. C. Berkeley during the period from 1980 to 1986. From 1986 to 1994 he was on the faculty of the School of Electrical Engineering at Cornell University. From 1994 he has been on the faculty of the EECS department at U. C. Berkeley.