Each myth/misstatement is discussed on its own page
- Large state spaces are hard for RL
- RL is slow
- RL does not have (m)any success stories since TDgammon
- RL does not work well with function approximation
- Value function approximation does not work (and so we should do something else - the current
favorite alternative seems to be policy search)
- Non-Markovianness invalidates standard RL methods
- POMDPs are hard for RL to deal with
- RL is about learning optimal policies
The following old myths are also unfortunately still around and still damaging for the field
- RL is model-free (or direct)
- RL is tabula rasa
- RL is table lookup
- RL = Q-learning or perhaps TD
Comments (0)
You don't have permission to comment on this page.