• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Get control of your email attachments. Connect all your Gmail accounts and in less than 2 minutes, Dokkio will automatically organize your file attachments. You can also connect Dokkio to Drive, Dropbox, and Slack. Sign up for free.


RL is tabula rasa

Page history last edited by Satinder Singh 11 years, 3 months ago

The ambition of this page is to discuss (and hopefully dispel) the myth that RL is tabula rasa.


Much of the early research in RL was focused on the development of such foundational ideas as the notion of "temporal differences", "eligibility traces", "caching the resuls of search in value functions", etc., and algorithms such as TD(lambda), actor-critic, REINFORCE, Q-learning, SARSA, Prioritized Sweeping, etc., were developed. Naturally much of the focus in the early years of analysis and empirical work was on developing an understanding of the properties of these algorithms and the cleanest cases to consider for this were often tabula rasa learning. Even today, in comparing algorithms using nearly tabula rasa learning may be appropriate in some cases. Thus viewers of RL papers are quite used to seeing tabula rasa RL in action. This probably accounts for this myth.

The reasons this is a myth are:

  1. In all serious applications (and by this I mean, research in which the application is of primary interest) of RL, all kinds of knowledge is brought to bear on the learning architecture and algorithm. Very often this is in the form of good state representations. For example, the best performing versions of Tesauro's TDgammon used hand-crafted features that he had developed through knowledge of the game and prior experience with Neurogammon (his attempt to learn backgammon using supervised learning). In other applications, there is knowledge about compact and good parameterizations of a model of the environment and this can be used to efficiently learn a model from experience with the real environment; e.g., this approach was used with great success by Andrew Ng and his coauthors in their work on helicopter control. Other approaches to inserting domain knowledge include using compact parametric policy spaces and policy search algorithms to learn efficiently. (See Successes of RL page for references to these and other applications).
  2. As the basic algorithms of RL have been fleshed out (see Algorithms of RL page), research attention within the field is increasingly focused on larger issues such as knowledge representation (whether learned by or inserted into the agent) for RL and AI, increasing the scope and range of applications of RL (e.g., 2004 workshop on real-life RL), rethinking the notions of state, action and reward (the basic components of RL formulations), and developing RL architectures focused on building agents that achieve broad competence in their environment. All of these are examples of *non* tabula rasa RL.

Comments (0)

You don't have permission to comment on this page.