If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.
You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

Successes of Reinforcement Learning

This version was saved 15 years, 1 month ago View current version Page history

Saved by Satinder Singh
on March 21, 2009 at 9:04:17 am

The ambition of this page is to collect RL success stories. By "success story" we mean an application of RL methods to a substantial and difficult problem domain that is of independent interest (to some community). Yes, this is vague and if that leads to a longer list than otherwise, that may be ok.

Jump to successes in: [[#RoboticS][Robotics]], [[#ControL][Control]], [[#OperationsresearcH][Operations Research]], [[

#GameS][Games]], [[#HcI][Human-Computer Interaction]], [[#EcO][Economics/Finance]], [[#CoS][Complex Simulation]]

-------------

Robotics

(Quadruped Gait Control) Policy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion by Nate Kohl and Peter Stone
(Quadruped Ball Acquisition) Learning Ball Acquisition on a Physical Robot by Peggy Fidelman and Peter Stone

(__Air Hockey__) [[http://www.cc.gatech.edu/projects/Learning_Research/][Learning from Observation U

sing Primitives]], and particularly the movie of a [[http://www.cc.gatech.edu/project/Learning_Research/mpeg/hockeyfullsmall.avi][humanoid robot playing air hockey]]. An example [[http://www.cc.gatech.edu/projects/Learning_Research/Docs/dbent_iros02.pdf][paper]].

(__Active Sensing__) [[http://www.cs.washington.edu/robotics/abstracts/active-sensing-iros-04.abstra

ct.html][Active Sensing Using Reinforcement Learning]] by Cody Kwok and Dieter Fox.

#ControL

* %RED%Control%ENDCOLOR%

1 (__Helicopter control__) [[http://www.robotics.stanford.edu/~ang/papers/iser04-invertedflight.pdf][I

nverted autonomous helicopter flight via reinforcement learning]], by Andrew Y. Ng, Adam Coates, Mark Diel, Varun Gana

pathi, Jamie Schulte, Ben Tse, Eric Berger and Eric Liang. In International Symposium on Experimental Robotics, 2004.

1 (__Helicopter control__) [[http://www.ri.cmu.edu/pubs/pub_3791.html][Autonomous helicopter control u

sing Reinforcement Learning Policy Search Methods]], by J.A. Bagnell and J. Schneider. In Proceedings of the Internati

onal Conference on Robotics and Automation, 2001.

#OperationsresearcH

* %RED%Operations Research%ENDCOLOR%

1 (__Pricing__) [[http://www.stanford.edu/~bvr/psfiles/GM-pricing.pdf][Opportunities and Challenges in

Using Online Preference Data for Vehicle Pricing: A Case Study at General Motors]] by P. Rusmevichientong, J. A. Sali

sbury, L. T. Truss, B. Van Roy, and P. W. Glynn.

1 (__Vehicle Routing__) [[http://web.engr.oregonstate.edu/~proper/AAAI04SProper.pdf][Scaling Average-r

eward Reinforcement Learning for Product Delivery]] by S. Proper and P. Tadepalli.

#GameS

* %RED%Games%ENDCOLOR%

1 (__Backgammon__) [[http://www.research.ibm.com/massive/tdl.html][Temporal difference learning and TD

-Gammon]] by Gerald Tesauro, Communications of the ACM, 38(3), March 1995.

1 (__Solitaire__) [[http://www.stanford.edu/~bvr/psfiles/solitaire.pdf][Solitaire: Man Versus Machine]

], by X. Yan, P. Diaconis, P. Rusmevichientong, and B. Van Roy, to appear in Advances in Neural Information Processing

Systems 17, MIT Press, 2005.

1 (__Chess__) [[http://www.syseng.anu.edu.au/lsg/knightcap.html][The KnightCap program]], which went f

rom a rating of 1600 to a rating of 2100 by altering its heuristic evaluation function using TD-lambda. [[http://cite

seer.ist.psu.edu/6262.html][CiteSeer]] has a link to the paper.

1 (__Checkers__) [[http://www.cs.ualberta.ca/~jonathan/Papers/Papers/td.ps][Temporal Difference Learni

ng Applied to a High-Performance Game-Playing Program]] by Jonathan Schaeffer, Markian Hlynka, and Vili Jussila, Inter

national Joint Conference on Artificial Intelligence (IJCAI), pp. 529-534, 2001..

#HcI

* %RED%Human-Computer Interaction%ENDCOLOR%

1 (__Spoken Dialogue Systems__) [[http://www.eecs.umich.edu/~baveja/Papers/RLDSjair.pdf][Optimizing D

ialogue Management with Reinforcement Learning: Experiments with the NJFun System]]. S. Singh, D. Litman, M. Kearns an

d M. Walker. In Journal of Artificial Intelligence Research (JAIR), Volume 16, pages 105-133, 2002

1 (__Software Agent in MOOs__) [[http://www.eecs.umich.edu/~baveja/Papers/CobotNIPS01.pdf][Cobot: A So

cial Reinforcement Learning Agent]]. C. Isbell, C. Shelton, M. Kearns, S. Singh, and P. Stone (2002). In Proceedings o

f Neural Information Processing Systems 14 (NIPS), pp. 1393-1400.

#EcO

* %RED%Economics/Finance%ENDCOLOR%

1 (__Trading__) Learning to Trade via Direct Reinforcement. John Moody and Matthew Saffell, IEEE Trans

actions on Neural Networks, Vol 12, No 4, July 2001.

#CoS

* %RED%Complex Simulation%ENDCOLOR%

1 (__Robot_Soccer__) [[http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/ICML2001.pdf][Scaling Re

inforcement Learning toward RoboCup Soccer]], by Peter Stone and Richard S. Sutton, Proceedings of the Eighteenth Inte

rnational Conference on Machine Learning, pp. 537–544, Morgan Kaufmann, San Francisco, CA, 2001.

#MkT

* %RED%Marketing%ENDCOLOR%

1 (__Targeted_Marketing__) [[http://www.research.ibm.com/people/n/nabe/kdd04AVAS.pdf][Cross Channel Op

timized Marketing by Reinforcement Learning]], by Naoki Abe, Naval Verma, Chid Apte and Robert Schroko, Proceedings of

the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2004.

Successes of Reinforcement Learning

Successes of Reinforcement Learning

Page Tools

Insert links

Comments (0)

Join this workspace

Navigator

SideBar

Recent Activity