The ambition of this page is to discuss (and hopefully dispel) the myth that RL is slow.
Ok, so is RL slow?
What the statement usually really means is one of the following.
- That the speaker knows of or has experienced one or more difficult problems on which the naive application of Q-learning or TD(%$\lambda$%) with inexpertly chosen feature-based state representation and inexperienced use of function approximation leads to infeasibly slow learning. This may well be true but it certainly does not justify calling RL itself slow.
- That the speaker has heard many an RL talk that shows learning curves for seemingly simple domains in which millions of actions or time steps were required to learn good policies. This justifiably seems slow to the speaker. Of course, in many cases those talks are not about developing efficient RL algorithms but about some AI issue such as "how can we learn hierarchical representations", or "how can one do intra-option learning", etc. Unlike much of the work in supervised learning which is really performance driven, much of the work in RL is issue driven. This is a good thing! [Of course, some of the RL talks with such graphs are just shoddy work (and yes, that can happen even in RL). As RL researchers we can help squash this source of the myth by being very clear in our talks about the goals of the empirical work we present (and not doing shoddy work, of course].
A main problem with this myth is that it begs the question "as compared to what?". There are at least two cases of interest here:
- Either there is no other way to solve the problem instance for which the speaker is asserting that RL is slow in which case in my view the statement is at least premature if not unwarranted, and
- There is some other efficient non-RL (and often quite engineered) method for the problem instance of interest but the speaker has compared it against some vanilla (or perhaps tabula rasa) RL method and concluded that RL is slow(er). The latter case is never convincing to me for it limits the claim to a very specific RL algorithm and choice of algorithm parameters, state representation (features) and function approximation.
Overall, in my view the current situation in RL is as follows. We don't yet have RL methods that are as off-the-shelf as say boosting with decision-stumps or SVMs are for classification. Which makes everyone who tries to do RL naively on their problem and fails to get an exciting answer free to pronounce RL as being too slow; this is lazy and bad for the field. There continue to be RL success stories on control, OR, financial, robotics and other AI problems. We should celebrate these successes and make them widely known. At the same time we should work towards a systematization of the rules of thumb or accumulated wisdom so that the RL application success can be shared more widely (and we don't have to call on Andrew Ng for the really difficult applications :-).
Comments (0)
You don't have permission to comment on this page.