If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

POMDPs are hard for RL to deal with

Page history last edited by Satinder Singh 15 years, 1 month ago

The ambition of this page is to discuss (and hopefully dispel) the myth that POMDPs are hard for RL to deal with...

There are several sources for this myth

It is known that POMDPs are PSPACE hard (someone please check this for me) to solve in the worst case.
Many folks subscribe to the myth that large-state MDPs are hard for RL and so it seems even clearer to them that POMDPs are hard for RL.
Many folks have tried to make lookup table Q-learning work on small POMDPs and failed.
Early work on POMDPs by experts in RL produced results that show very slow computation of good policies for toy POMDPs. Others see these results and take it to mean that POMDPs are hard in general for RL.

The reasons why this is a myth are also many.

Almost all the successes of RL have the characteristic that the observations available to the agent at each time step are not a sufficient statistic of history, and hence the domain is a POMDP. Therefore, clearly RL succeeds at some large-scale POMDPs.
It is true that solving POMDPs is hard in the worst case but that does not seem to be holding us back in solving them when good RL methods are applied by researchers with sufficient expertise in RL.
Empirical work has demonstrated the effectiveness of the use of eligibility traces in overcoming some of the issues with hidden state.
Some of the more recent RL algorithms such as sparse-sampling and trajetory-tree scale independently of the size of the state space (and hence apply equally well to POMDPs where the state space is the belief space and hence continuous).