Monte-Carlo Search; Algorithm Selection; Grammar of Algorithms
Abstract :
[en] Much current research in AI and games is being devoted to Monte Carlo search (MCS) algorithms. While the quest for a single unified MCS algorithm that would perform well on all problems is of major interest for AI, practitioners often know in advance the problem they want to solve, and spend plenty of time exploiting this knowledge to customize their MCS algorithm in a problem-driven way. We propose an MCS algorithm discovery scheme to perform this in an automatic and reproducible way. We first introduce a grammar over MCS algorithms that enables inducing a rich space of candidate algorithms. Afterwards, we search in this space for the algorithm that performs best on average for a given distribution of training problems. We rely on multi-armed bandits to approximately solve this optimization problem. The experiments, generated on three different domains, show that our approach enables discovering algorithms that outperform several well-known MCS algorithms such as Upper Confidence bounds applied to Trees and Nested Monte Carlo search. We also show that the discovered algorithms are generally quite robust with respect to changes in the distribution over the training problems.
Disciplines :
Computer science
Author, co-author :
Maes, Francis
Lupien St-Pierre, David ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Dép. d'électric., électron. et informat. (Inst.Montefiore)
Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids
Language :
English
Title :
Monte Carlo search algorithm discovery for single-player games
Publication date :
September 2013
Journal title :
IEEE Transactions on Computational Intelligence and AI in Games
ISSN :
1943-068X
eISSN :
1943-0698
Publisher :
Institute of Electrical and Electronics Engineers, United States - New Jersey
scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.
Bibliography
L. Kocsis and C. Szepesvári, "Bandit based Monte Carlo planning," in Proc. 17th Eur. Conf. Mach. Learn., 2006, pp. 282-293. (Pubitemid 44618839)
R. Coulom, "Efficient selectivity and backup operators in Monte Carlo tree search," in Proc. 5th Int. Conf. Comput. Games, Turin, Italy, 2006, pp. 72-83.
T. Cazenave, "Nested Monte Carlo search," in Proc. 21st Int. Joint Conf. Artif. Intell., 2009, pp. 456-461.
J. Méhat and T. Cazenave, "Combining UCT and nested Monte Carlo search for single-player general game playing," IEEE Trans. Comput. Intell. AI Games, vol. 2, no. 4, pp. 271-277, Dec. 2010.
G. Tesauro and G. R. Galperin, "On-line policy improvement using Monte Carlo search," Proc. Neural Inf. Process. Syst., vol. 96, pp. 1068-1074, 1996.
P. Auer, P. Fischer, and N. Cesa-Bianchi, "Finite-time analysis of the multi-armed bandit problem," Mach. Learn., vol. 47, pp. 235-256, 2002. (Pubitemid 34126111)
T. Cazenave, "ReflexiveMonte Carlo search," in Proc. Comput.Games Workshop, Amsterdam, The Netherlands, 2007, pp. 165-173.
G. Chaslot, S. de Jong, J.-T. Saito, and J.Uiterwijk, "Monte-Carlo tree search in production management problems," in Proc. Benelux Conf. Artif. Intell., 2006, pp. 91-98.
M. P. D. Schadd, M. H. M. Winands, H. J. V. D. Herik, G. M. J. b. Chaslot, and J. W. H. M. Uiterwijk, "Single-player Monte-Carlo tree search," in Proceedings of Computers and Games (CG), ser. Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, 2008, vol. 5131, pp. 1-12.
F. de Mesmay, A. Rimmel, Y. Voronenko, and M. Püschel, "Banditbased optimization on graphs with application to library performance tuning," in Proc. Int. Conf.Mach. Learn.,Montréal,QC, Canada, 2009, pp. 729-736.
G.-B. Chaslot, J.-B. Hoock, J. Perez, A. Rimmel, O. Teytaud, and M. Winands, "Meta Monte Carlo tree search for automatic opening book generation," in Proc. IJCAI Workshop Gen. Intell. Game Playing Agents, 2009, pp. 7-12.
F.Maes, L.Wehenkel, and D. Ernst, "Learning to play-armed bandit problems," in Proc. Int. Conf. Agents Artif. Intell., Vilamoura, Algarve, Portugal, Feb. 2012.
F.Maes, L. Wehenkel, and D. Ernst, "Meta-learning of exploration/exploitation strategies: The multi-armed bandit case," in Proc. Int. Conf. Agents Artif. Intell., 2012 [Online]. Available: arXiv:1207.5208
M. Castronovo, F. Maes, R. Fonteneau, and D. Ernst, "Learning exploration/exploitation strategies for single trajectory reinforcement learning," in Proc. 10th Eur. Workshop Reinforcement Learn., Edinburgh, Scotland, Jun. 2012.
F. Maes, R. Fonteneau, L. Wehenkel, and D. Ernst, "Policy search in a space of simple closed-form formulas: Towards interpretability of reinforcement learning," in Discovery Science, ser. Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, Oct. 2012, vol. 7569, pp. 37-51.
S. Bubeck, R. Munos, and G. Stoltz, "Pure exploration in multi-armed bandits problems," in Proc. Algorithmic Learn. Theory, 2009, pp. 23-37.
J. Koza and R. Poli, "Genetic programming," in Proc. Search Methodol., E. K. Burke and G. Kendall, Eds., 2005, pp. 127-164.
P. Auer, N. Cesa-Bianchi, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," Mach. Learn., vol. 47, no. 2, pp. 235-256, 2002. (Pubitemid 34126111)
P.-A. Coquelin and R. Munos, "Bandit algorithms for tree search," in Proc. Uncertainty Artif. Intell., Vancouver, BC, Canada, 2007, arXiv preprint cs/0703062.
J.-B. Hoock and O. Teytaud, "Bandit-based genetic programming," in Proc. 13th Eur. Conf. Genetic Programm., 2010, pp. 268-277.
Z. Geem, "Harmony search algorithm for solving Sudoku," in Proc. Int. Conf. Knowl.-Based Intell. Inf. Eng. Syst., 2007, pp. 371-378.
T. Cazenave, "Nested Monte Carlo expression discovery," in Proc. Eur. Conf. Artif. Intell., Lisbon, Portugal, 2010, pp. 1057-1058.
N. Uy, N. Hoai, M. O'Neill, R. McKay, and E. Galván-López, "Semantically-based crossover in genetic programming: Application to realvalued symbolic regression," Genetic Programm. Evolvable Mach., vol. 12, no. 2, pp. 91-119, 2011.
C. Boyer, 2012 [Online]. Available: http://www.morpionsolitaire.com
C. Rosin, "Nested rollout policy adaptation for Monte Carlo tree search," in Proc. 22nd Int. Joint Conf. Artif. Intell., Barcelona, Spain, 2011, pp. 649-654.
E. Demaine, M. Demaine, A. Langerman, and S. Langerman, "Morpion solitaire," Theory Comput. Syst., vol. 39, no. 3, pp. 439-453, 2006. (Pubitemid 43701369)
P. Perrick, D. St-Pierre, F. Maes, and D. Ernst, "Comparison of different selection strategies in Monte Carlo tree search for the game of Tron," in Proc. IEEE Conf. Comput. Intell. Games, Granada, Spain, 2012, pp. 242-249.
I. Chaslot,M.Winands, andH. van den Herik, "Parameter tuning by the cross-entropy method," in Proc. Eur. Workshop Reinforcement Learn., 2008.
R. Coulom, "CLOP: Confident local optimization for noisy black-box parameter tuning," in Advances in ComputerGames, ser. LectureNotes in Computer Science. Berlin, Germany: Springer-Verlag, 2012, vol. 7168, pp. 146-157.
F. Maes, L.Wehenkel, and D. Ernst, "Optimized look-ahead tree search policies," in Recent Advances in Reinforcement Learning, ser. Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, 2012, vol. 7188, pp. 189-200.
O. Chapelle and L. Li, "An empirical evaluation of Thompson sampling," in Proc. Neural Inf. Process. Syst., 2011, pp. 2249-2257.
A. Bourki, M. Coulm, P. Rolet, O. Teytaud, and P. Vayssiere, "Parameter tuning by simple regret algorithms and multiple simultaneous hypothesis testing," in Proc. Int. Conf. Inf. Control Autom. Robot., 2010.
V. Berthier, H. Doghmen, and O. Teytaud, "Consistency modifications for automatically tuned Monte Carlo tree search," in Proc. 4th Conf. Learn. Intell. Optim., 2010, pp. 111-124.
S. Gelly and D. Silver, "Combining online and offline knowledge in UCT," in Proc. 24th Int. Conf.Mach. Learn., 2007, pp. 273-280.
G. Chaslot,M.Winands, H. Herik, J. Uiterwijk, and B. Bouzy, "Progressive strategies for Monte Carlo tree search," in Proc. 10th Joint Conf. Inf. Sci., 2007, pp. 655-661.
D. St-Pierre, Q. Louveaux, and O. Teytaud, "Online sparse bandit for card game," in Advances in Computer Games, ser. Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, 2012, vol. 7168, pp. 295-305.
C. Browne, E. Powley, D.Whitehouse, S. Lucas, P. Cowling, P. Rohlfshagen, S. Tavener,D. Perez, S. Samothrakis, and S. Colton, "A survey of Monte Carlo tree search methods," IEEE Trans. Comput. Intell. AI Games, vol. 4, no. 1, pp. 1-43, Mar. 2012.
T. Cazenave, "Evolving Monte Carlo tree search algorithms," Tech. Rep., 2007.
F. Maes, L. Wehenkel, and D. Ernst, "Automatic discovery of ranking formulas for playing with multi-armed bandits," in Proc. 9th Eur. Workshop Reinforcement Learn., 2011, pp. 5-17.
D. Billings, A. Davidson, J. Schaeffer, and D. Szafron, "The challenge of poker," Artif. Intell., vol. 134, no. 1-2, pp. 201-240, 2002. (Pubitemid 34086584)
V. Nannen and A. Eiben, "Relevance estimation and value calibration of evolutionary algorithm parameters," in Proc. 20th Int. Joint Conf. Artif. Intell., 2007, pp. 975-980.
This website uses cookies to improve user experience. Read more
Save & Close
Accept all
Decline all
Show detailsHide details
Cookie declaration
About cookies
Strictly necessary
Performance
Strictly necessary cookies allow core website functionality such as user login and account management. The website cannot be used properly without strictly necessary cookies.
This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.
Performance cookies are used to see how visitors use the website, eg. analytics cookies. Those cookies cannot be used to directly identify a certain visitor.
Used to store the attribution information, the referrer initially used to visit the website
Cookies are small text files that are placed on your computer by websites that you visit. Websites use cookies to help users navigate efficiently and perform certain functions. Cookies that are required for the website to operate properly are allowed to be set without your permission. All other cookies need to be approved before they can be set in the browser.
You can change your consent to cookie usage at any time on our Privacy Policy page.