The *ambition* of this page is to discuss (and hopefully dispel) the myth that POMDPs are hard for RL to deal with...

There are several sources for this myth

- It is known that POMDPs are PSPACE hard (someone please check this for me) to solve in the worst case.
- Many folks subscribe to the myth that large-state MDPs are hard for RL and so it seems even clearer to them that POMDPs are hard for RL.
- Many folks have tried to make lookup table Q-learning work on small POMDPs and failed.
- Early work on POMDPs by experts in RL produced results that show very slow computation of good policies for toy POMDPs. Others see these results and take it to mean that POMDPs are hard in general for RL.

The reasons why this is a myth are also many.

- Almost all the successes of RL have the characteristic that the observations available to the agent at each time step are not a sufficient statistic of history, and hence the domain is a POMDP. Therefore, clearly RL succeeds at some large-scale POMDPs.
- It is true that solving POMDPs is hard in the worst case but that does not seem to be holding us back in solving them when
*good* RL methods are applied by researchers with sufficient expertise in RL.
- Empirical work has demonstrated the effectiveness of the use of eligibility traces in overcoming some of the issues with hidden state.
- Some of the more recent RL algorithms such as sparse-sampling and trajetory-tree scale independently of the size of the state space (and hence apply equally well to POMDPs where the state space is the belief space and hence continuous).

## Comments (0)

You don't have permission to comment on this page.