The ambition of this page is to collect RL success stories. By "success story" we mean an application of RL methods to a substantial and difficult problem domain that is of independent interest (to some community). Yes, this is vague and if that leads to a longer list than otherwise, that may be ok.
Jump to successes in: [[#RoboticS][Robotics]], [[#ControL][Control]], [[#OperationsresearcH][Operations Research]], [[
#GameS][Games]], [[#HcI][Human-Computer Interaction]], [[#EcO][Economics/Finance]], [[#CoS][Complex Simulation]]
-------------
<font size = 2>
_To edit this page, just click on Edit on the top left of screen (you will have to register once and then remember you
r username and password *or* you can use Username: <nop>AnonyMous with Password: <nop>AnonyMous)_.
<font size = 3>
#RoboticS
* %RED%Robotics%ENDCOLOR%
1 (__Quadruped Gait Control__) [[http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/icra04.pdf][Po
licy Gradient Reinforcement Learning for Fast Quadrupedal Locomotion]] by Nate Kohl and Peter Stone
1 (__Quadruped Ball Acquisition__) [[http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/ISRA2004-c
hinpinch.pdf][Learning Ball Acquisition on a Physical Robot]] by Peggy Fidelman and Peter Stone
1 (__Air Hockey__) [[http://www.cc.gatech.edu/projects/Learning_Research/][Learning from Observation U
sing Primitives]], and particularly the movie of a [[http://www.cc.gatech.edu/projects/Learning_Research/mpeg/hockeyfu
llsmall.avi][humanoid robot playing air hockey]]. An example [[http://www.cc.gatech.edu/projects/Learning_Research/Doc
s/dbent_iros02.pdf][paper]].
1 (__Active Sensing__) [[http://www.cs.washington.edu/robotics/abstracts/active-sensing-iros-04.abstra
ct.html][Active Sensing Using Reinforcement Learning]] by Cody Kwok and Dieter Fox.
#ControL
* %RED%Control%ENDCOLOR%
1 (__Helicopter control__) [[http://www.robotics.stanford.edu/~ang/papers/iser04-invertedflight.pdf][I
nverted autonomous helicopter flight via reinforcement learning]], by Andrew Y. Ng, Adam Coates, Mark Diel, Varun Gana
pathi, Jamie Schulte, Ben Tse, Eric Berger and Eric Liang. In International Symposium on Experimental Robotics, 2004.
1 (__Helicopter control__) [[http://www.ri.cmu.edu/pubs/pub_3791.html][Autonomous helicopter control u
sing Reinforcement Learning Policy Search Methods]], by J.A. Bagnell and J. Schneider. In Proceedings of the Internati
onal Conference on Robotics and Automation, 2001.
#OperationsresearcH
* %RED%Operations Research%ENDCOLOR%
1 (__Pricing__) [[http://www.stanford.edu/~bvr/psfiles/GM-pricing.pdf][Opportunities and Challenges in
Using Online Preference Data for Vehicle Pricing: A Case Study at General Motors]] by P. Rusmevichientong, J. A. Sali
sbury, L. T. Truss, B. Van Roy, and P. W. Glynn.
1 (__Vehicle Routing__) [[http://web.engr.oregonstate.edu/~proper/AAAI04SProper.pdf][Scaling Average-r
eward Reinforcement Learning for Product Delivery]] by S. Proper and P. Tadepalli.
#GameS
* %RED%Games%ENDCOLOR%
1 (__Backgammon__) [[http://www.research.ibm.com/massive/tdl.html][Temporal difference learning and TD
-Gammon]] by Gerald Tesauro, Communications of the ACM, 38(3), March 1995.
1 (__Solitaire__) [[http://www.stanford.edu/~bvr/psfiles/solitaire.pdf][Solitaire: Man Versus Machine]
], by X. Yan, P. Diaconis, P. Rusmevichientong, and B. Van Roy, to appear in Advances in Neural Information Processing
Systems 17, MIT Press, 2005.
1 (__Chess__) [[http://www.syseng.anu.edu.au/lsg/knightcap.html][The KnightCap program]], which went f
rom a rating of 1600 to a rating of 2100 by altering its heuristic evaluation function using TD-lambda. [[http://cite
seer.ist.psu.edu/6262.html][CiteSeer]] has a link to the paper.
1 (__Checkers__) [[http://www.cs.ualberta.ca/~jonathan/Papers/Papers/td.ps][Temporal Difference Learni
ng Applied to a High-Performance Game-Playing Program]] by Jonathan Schaeffer, Markian Hlynka, and Vili Jussila, Inter
national Joint Conference on Artificial Intelligence (IJCAI), pp. 529-534, 2001..
#HcI
* %RED%Human-Computer Interaction%ENDCOLOR%
1 (__Spoken Dialogue Systems__) [[http://www.eecs.umich.edu/~baveja/Papers/RLDSjair.pdf][Optimizing D
ialogue Management with Reinforcement Learning: Experiments with the NJFun System]]. S. Singh, D. Litman, M. Kearns an
d M. Walker. In Journal of Artificial Intelligence Research (JAIR), Volume 16, pages 105-133, 2002
1 (__Software Agent in MOOs__) [[http://www.eecs.umich.edu/~baveja/Papers/CobotNIPS01.pdf][Cobot: A So
cial Reinforcement Learning Agent]]. C. Isbell, C. Shelton, M. Kearns, S. Singh, and P. Stone (2002). In Proceedings o
f Neural Information Processing Systems 14 (NIPS), pp. 1393-1400.
#EcO
* %RED%Economics/Finance%ENDCOLOR%
1 (__Trading__) Learning to Trade via Direct Reinforcement. John Moody and Matthew Saffell, IEEE Trans
actions on Neural Networks, Vol 12, No 4, July 2001.
#CoS
* %RED%Complex Simulation%ENDCOLOR%
1 (__Robot_Soccer__) [[http://www.cs.utexas.edu/~pstone/Papers/bib2html-links/ICML2001.pdf][Scaling Re
inforcement Learning toward RoboCup Soccer]], by Peter Stone and Richard S. Sutton, Proceedings of the Eighteenth Inte
rnational Conference on Machine Learning, pp. 537–544, Morgan Kaufmann, San Francisco, CA, 2001.
#MkT
* %RED%Marketing%ENDCOLOR%
1 (__Targeted_Marketing__) [[http://www.research.ibm.com/people/n/nabe/kdd04AVAS.pdf][Cross Channel Op
timized Marketing by Reinforcement Learning]], by Naoki Abe, Naval Verma, Chid Apte and Robert Schroko, Proceedings of
the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 2004.
Comments (0)
You don't have permission to comment on this page.