• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


Non-Markovianness invalidates standard RL methods

Page history last edited by Satinder Singh 15 years, 4 months ago

The ambition of this page is to discuss (and hopefully dispel) the myth: Non-Markovianness invalidates most (basic) RL methods.


This myth says that all the basic RL algorithms, Q-learning, TD(lambda), Sarsa(lambda), Bellman-gradient, etc., depend on the Markovian assumption, i.e., that observations are state and hence are simply inapplicable to non-Markovian RL problems. And of course, most problems of interest are non-Markovian and hence basic RL algorithms are not very applicable...

There are several reasons for this myth

  1. Most (though not all) of the theoretical convergence results for these basic RL algorithms assume Markovianness. The reason for this is obvious - the math becomes easy.
  2. There are very simple non-Markovian problems in which specific RL algorithms such as Q-learning with lookup tables and without eligibiilty traces converges to bad policies.
  3. Some folks have experienced trouble with naive implementations of say lookup table Q-learning to non-Markovian problems and other folks have heard this trouble described.


The reasons that this is a myth are also many.

  1. The fact that the currently available positive convergence results for the basic RL algorithms assume Markovianness does not imply anything about how these algorithms deal with non-Markovianness.
  2. The negative examples of problems where the basic RL algorithms have trouble with non-Markovianness are just that - examples that are troubling for the simplest instantiations of the basic RL algorithms, the point here being that one would never use these forms of the basic RL algorithms in any real application of interest.
  3. As noted above, most (if not all) of the RL problems of interest are non-Markovianness and hence the many RL success stories on this page attest to the ability of RL algorithms to deal quite successfully with non-Markovianness. This is perhaps the strongest argument as to why this is a myth.


Now lets consider a few of the ways in which RL algorithms successfully deal with hidden state.

  1. Eligibility traces: there is empirical evidence that eligbility traces (being more Monte Carlo) are able to deal well with hidden state when added to basic RL algorithms like TD and Q-learning.
  2. Feature construction: when faced with any real problem the researcher/practioner designs state features that can include features derived from memory of past observations/features to build good estimates of state.
  3. Policy search methods: RL methods that are based on some explicit representation of a space of policies along with an algorithm for searching among the space of policies can easily be immune to the problem of hidden state.


Comments (0)

You don't have permission to comment on this page.