• If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • Get control of your email attachments. Connect all your Gmail accounts and in less than 2 minutes, Dokkio will automatically organize your file attachments. You can also connect Dokkio to Drive, Dropbox, and Slack. Sign up for free.


Successes of Reinforcement Learning

Page history last edited by Satinder Singh 11 years, 3 months ago

The ambition of this page is to collect RL success stories. By "success story" we mean an application of RL methods to a substantial and difficult problem domain that is of independent interest (to some community). Yes, this is vague and if that leads to a longer list than otherwise, that may be ok.





  1. (Quadruped Gait Control) Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion  by Nate Kohl and Peter Stone
  2. (Quadruped Ball Acquisition) Learning Ball Acquisition on a Physical Robot  by Peggy Fidelman and Peter Stone
  3. (Air Hockey) Learning from Observation Using Primitives, and particularly the movie of a humanoid robot playing air hockey. An example paper.
  4. (Active Sensing) Active Sensing Using Reinforcement Learning by Cody Kwok and Dieter Fox.




  1. (Helicopter control) Inverted autonomous helicopter flight via reinforcement learning, by Andrew Y. Ng, Adam Coates, Mark Diel, Varun Ganapathi, Jamie Schulte, Ben Tse, Eric Berger and Eric Liang. In International Symposium on Experimental Robotics, 2004.
  2. (Helicopter control) Autonomous helicopter control using Reinforcement Learning Policy Search Methods, by J.A. Bagnell and J. Schneider. In Proceedings of the International Conference on Robotics and Automation, 2001.


Operations Research


  1. (Pricing) Opportunities and Challenges in Using Online Preference Data for Vehicle Pricing: A Case Study at General Motors by P. Rusmevichientong, J. A. Salisbury, L. T. Truss, B. Van Roy, and P. W. Glynn.
  2. (Vehicle Routing) Scaling Average-reward Reinforcement Learning for Product Delivery by S. Proper and P. Tadepalli.
  3. (Targeted_Marketing) Cross Channel Optimized Marketing by Reinforcement Learning, by Naoki Abe, Naval Verma, Chid Apte and Robert Schroko, Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2004.




  1. (Backgammon) Temporal difference learning and TD-Gammon by Gerald Tesauro, Communications of the ACM, 38(3), March 1995.
  2. (Solitaire) Solitaire: Man Versus Machine, by X. Yan, P. Diaconis, P. Rusmevichientong, and B. Van Roy, to appear in Advances in Neural Information Processing Systems 17, MIT Press, 2005.
  3. (Chess) The KnightCap program, which went from a rating of 1600 to a rating of 2100 by altering its heuristic evaluation function using TD-lambda. pdf
  4. (Checkers) Temporal Difference Learning Applied to a High-Performance Game-Playing Program by Jonathan Schaeffer, Markian Hlynka, and Vili Jussila, International Joint Conference on Artificial Intelligence (IJCAI), pp. 529-534, 2001..




  1. (Spoken Dialogue SystemsOptimizing Dialogue Management with Reinforcement Learning: Experiments with the NJFun System. S. Singh, D. Litman, M. Kearns and M. Walker. In Journal of Artificial Intelligence Research (JAIR), Volume 16, pages 105-133, 2002
  2. (Software Agent in MOOs) Cobot in LambdaMOO: An Adaptive Social Statistics Agent. C. Isbell, M. Kearns, S. Singh, C. Shelton, P. Stone and D. Korman.




  1. (Trading) Learning to Trade via Direct Reinforcement. John Moody and Matthew Saffell, IEEE Transactions on Neural Networks, Vol 12, No 4, July 2001.


Complex Simulations


  1. (Robot Soccer) Scaling Reinforcement Learning toward RoboCup Soccer, by Peter Stone and Richard S. Sutton, Proceedings of the Eighteenth International Conference on Machine Learning, pp. 537–544, Morgan Kaufmann, San Francisco, CA, 2001.


Comments (0)

You don't have permission to comment on this page.