• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!


RL is model-free (or direct)

Page history last edited by Satinder Singh 15 years, 2 months ago

The ambition of this page is to explore (and hopefully dispel) the myth that RL is model free (or direct).


This myth probably stems from the fact that much of the early work in RL focused on model-free algorithms such as TD(lambda) and Q-learning. By contrast much of the work in adaptive control and operations research on learning to control unknown dynamical systems was model-based (i.e., used experience with the system to identify or model the system and then used the learned model to derive a good policy). Some of the early excitement in RL was indeed due to the fact that the field had contributed the first provably convergent model-free algorithms (thereby adding something new to learning control).

(Note that model-free algorithms are often referred to as direct algorithms and model-based algorithms as indirect algorithms).

However, even in the early days there were exciting model-based RL algorithms such as Moore's prioritized sweeping and Sutton's Dyna. There were some friendly debates within the RL community as to whether model-based or model-free could be shown to be clearly superior to the other. Empirical RL research provided successes for both model-based and model-free RL approaches (see Successes of RL page for examples). Kearns & Singh conducted an analysis of the sample complexity of model-based and model-free algorithms (ignoring exploration) and showed them to be nearly the same. In any case, over time it was clear that any serious progress on the larger RL question of building agents that can learn to be broadly competent in real-world environments will surely require the agents to explicitly represent knowledge about their world; including knowledge that predicts the consequences of actions.

 The debate has since moved on largely to good representations of model knowledge (e.g., ICML 2004 had two workshops on this topic including one on predictive representations of knowledge and one on relational representations for RL).

Thus, this myth should by now have been relegated to the dustbin of history for even a casual perusal of current research and the set of algorithms available should make it clear that building models is a key ingredient of research and progress in RL.

Comments (0)

You don't have permission to comment on this page.