Monte Carlo search algorithm discovery for single-player games

[en] Much current research in AI and games is being devoted to Monte Carlo search (MCS) algorithms. While the quest for a single uniﬁed MCS algorithm that would perform well on all problems is of major interest for AI, practitioners often know in advance the problem they want to solve, and spend plenty of time exploiting this knowledge to customize their MCS algorithm in a problem-driven way. We propose an MCS algorithm discovery scheme to perform this in an automatic and reproducible way. We ﬁrst introduce a grammar over MCS algorithms that enables inducing a rich space of candidate algorithms. Afterwards, we search in this space for the algorithm that performs best on average for a given distribution of training problems. We rely on multi-armed bandits to approximately solve this optimization problem. The experiments, generated on three different domains, show that our approach enables discovering algorithms that outperform several well-known MCS algorithms such as Upper Conﬁdence bounds applied to Trees and Nested Monte Carlo search. We also show that the discovered algorithms are generally quite robust with respect to changes in the distribution over the training problems.

Disciplines :

Computer science

Author, co-author :

Maes, Francis

Lupien St-Pierre, David ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Dép. d'électric., électron. et informat. (Inst.Montefiore)

Ernst, Damien ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Smart grids

Language :

English

Title :

Monte Carlo search algorithm discovery for single-player games

Publication date :

September 2013

Journal title :

IEEE Transactions on Computational Intelligence and AI in Games

ISSN :

1943-068X

eISSN :

1943-0698

Publisher :

Institute of Electrical and Electronics Engineers, United States - New Jersey

Volume :

Issue :

Pages :

201-213

Peer reviewed :

Peer Reviewed verified by ORBi

Available on ORBi :

since 16 September 2013

Statistics

Number of views

372 (46 by ULiège)

Number of downloads

455 (40 by ULiège)

More statistics

Scopus citations^®

Scopus citations^®
without self-citations

OpenCitations

OpenAlex citations

See more details

publications

supporting

mentioning

contrasting

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Bibliography

L. Kocsis and C. Szepesvári, "Bandit based Monte Carlo planning," in Proc. 17th Eur. Conf. Mach. Learn., 2006, pp. 282-293. (Pubitemid 44618839)
R. Coulom, "Efficient selectivity and backup operators in Monte Carlo tree search," in Proc. 5th Int. Conf. Comput. Games, Turin, Italy, 2006, pp. 72-83.
T. Cazenave, "Nested Monte Carlo search," in Proc. 21st Int. Joint Conf. Artif. Intell., 2009, pp. 456-461.
J. Méhat and T. Cazenave, "Combining UCT and nested Monte Carlo search for single-player general game playing," IEEE Trans. Comput. Intell. AI Games, vol. 2, no. 4, pp. 271-277, Dec. 2010.
G. Tesauro and G. R. Galperin, "On-line policy improvement using Monte Carlo search," Proc. Neural Inf. Process. Syst., vol. 96, pp. 1068-1074, 1996.
P. Auer, P. Fischer, and N. Cesa-Bianchi, "Finite-time analysis of the multi-armed bandit problem," Mach. Learn., vol. 47, pp. 235-256, 2002. (Pubitemid 34126111)
T. Cazenave, "ReflexiveMonte Carlo search," in Proc. Comput.Games Workshop, Amsterdam, The Netherlands, 2007, pp. 165-173.
G. Chaslot, S. de Jong, J.-T. Saito, and J.Uiterwijk, "Monte-Carlo tree search in production management problems," in Proc. Benelux Conf. Artif. Intell., 2006, pp. 91-98.
M. P. D. Schadd, M. H. M. Winands, H. J. V. D. Herik, G. M. J. b. Chaslot, and J. W. H. M. Uiterwijk, "Single-player Monte-Carlo tree search," in Proceedings of Computers and Games (CG), ser. Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, 2008, vol. 5131, pp. 1-12.
F. de Mesmay, A. Rimmel, Y. Voronenko, and M. Püschel, "Banditbased optimization on graphs with application to library performance tuning," in Proc. Int. Conf.Mach. Learn.,Montréal,QC, Canada, 2009, pp. 729-736.
G.-B. Chaslot, J.-B. Hoock, J. Perez, A. Rimmel, O. Teytaud, and M. Winands, "Meta Monte Carlo tree search for automatic opening book generation," in Proc. IJCAI Workshop Gen. Intell. Game Playing Agents, 2009, pp. 7-12.
F.Maes, L.Wehenkel, and D. Ernst, "Learning to play-armed bandit problems," in Proc. Int. Conf. Agents Artif. Intell., Vilamoura, Algarve, Portugal, Feb. 2012.
F.Maes, L. Wehenkel, and D. Ernst, "Meta-learning of exploration/exploitation strategies: The multi-armed bandit case," in Proc. Int. Conf. Agents Artif. Intell., 2012 [Online]. Available: arXiv:1207.5208
M. Castronovo, F. Maes, R. Fonteneau, and D. Ernst, "Learning exploration/exploitation strategies for single trajectory reinforcement learning," in Proc. 10th Eur. Workshop Reinforcement Learn., Edinburgh, Scotland, Jun. 2012.
F. Maes, R. Fonteneau, L. Wehenkel, and D. Ernst, "Policy search in a space of simple closed-form formulas: Towards interpretability of reinforcement learning," in Discovery Science, ser. Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, Oct. 2012, vol. 7569, pp. 37-51.
S. Bubeck, R. Munos, and G. Stoltz, "Pure exploration in multi-armed bandits problems," in Proc. Algorithmic Learn. Theory, 2009, pp. 23-37.
J. Koza and R. Poli, "Genetic programming," in Proc. Search Methodol., E. K. Burke and G. Kendall, Eds., 2005, pp. 127-164.
P. Auer, N. Cesa-Bianchi, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," Mach. Learn., vol. 47, no. 2, pp. 235-256, 2002. (Pubitemid 34126111)
P.-A. Coquelin and R. Munos, "Bandit algorithms for tree search," in Proc. Uncertainty Artif. Intell., Vancouver, BC, Canada, 2007, arXiv preprint cs/0703062.
J.-B. Hoock and O. Teytaud, "Bandit-based genetic programming," in Proc. 13th Eur. Conf. Genetic Programm., 2010, pp. 268-277.
Z. Geem, "Harmony search algorithm for solving Sudoku," in Proc. Int. Conf. Knowl.-Based Intell. Inf. Eng. Syst., 2007, pp. 371-378.
T. Cazenave, "Nested Monte Carlo expression discovery," in Proc. Eur. Conf. Artif. Intell., Lisbon, Portugal, 2010, pp. 1057-1058.
N. Uy, N. Hoai, M. O'Neill, R. McKay, and E. Galván-López, "Semantically-based crossover in genetic programming: Application to realvalued symbolic regression," Genetic Programm. Evolvable Mach., vol. 12, no. 2, pp. 91-119, 2011.
C. Boyer, 2012 [Online]. Available: http://www.morpionsolitaire.com
C. Rosin, "Nested rollout policy adaptation for Monte Carlo tree search," in Proc. 22nd Int. Joint Conf. Artif. Intell., Barcelona, Spain, 2011, pp. 649-654.
E. Demaine, M. Demaine, A. Langerman, and S. Langerman, "Morpion solitaire," Theory Comput. Syst., vol. 39, no. 3, pp. 439-453, 2006. (Pubitemid 43701369)
P. Perrick, D. St-Pierre, F. Maes, and D. Ernst, "Comparison of different selection strategies in Monte Carlo tree search for the game of Tron," in Proc. IEEE Conf. Comput. Intell. Games, Granada, Spain, 2012, pp. 242-249.
I. Chaslot,M.Winands, andH. van den Herik, "Parameter tuning by the cross-entropy method," in Proc. Eur. Workshop Reinforcement Learn., 2008.
R. Coulom, "CLOP: Confident local optimization for noisy black-box parameter tuning," in Advances in ComputerGames, ser. LectureNotes in Computer Science. Berlin, Germany: Springer-Verlag, 2012, vol. 7168, pp. 146-157.
F. Maes, L.Wehenkel, and D. Ernst, "Optimized look-ahead tree search policies," in Recent Advances in Reinforcement Learning, ser. Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, 2012, vol. 7188, pp. 189-200.
O. Chapelle and L. Li, "An empirical evaluation of Thompson sampling," in Proc. Neural Inf. Process. Syst., 2011, pp. 2249-2257.
A. Bourki, M. Coulm, P. Rolet, O. Teytaud, and P. Vayssiere, "Parameter tuning by simple regret algorithms and multiple simultaneous hypothesis testing," in Proc. Int. Conf. Inf. Control Autom. Robot., 2010.
V. Berthier, H. Doghmen, and O. Teytaud, "Consistency modifications for automatically tuned Monte Carlo tree search," in Proc. 4th Conf. Learn. Intell. Optim., 2010, pp. 111-124.
S. Gelly and D. Silver, "Combining online and offline knowledge in UCT," in Proc. 24th Int. Conf.Mach. Learn., 2007, pp. 273-280.
G. Chaslot,M.Winands, H. Herik, J. Uiterwijk, and B. Bouzy, "Progressive strategies for Monte Carlo tree search," in Proc. 10th Joint Conf. Inf. Sci., 2007, pp. 655-661.
D. St-Pierre, Q. Louveaux, and O. Teytaud, "Online sparse bandit for card game," in Advances in Computer Games, ser. Lecture Notes in Computer Science. Berlin, Germany: Springer-Verlag, 2012, vol. 7168, pp. 295-305.
C. Browne, E. Powley, D.Whitehouse, S. Lucas, P. Cowling, P. Rohlfshagen, S. Tavener,D. Perez, S. Samothrakis, and S. Colton, "A survey of Monte Carlo tree search methods," IEEE Trans. Comput. Intell. AI Games, vol. 4, no. 1, pp. 1-43, Mar. 2012.
T. Cazenave, "Evolving Monte Carlo tree search algorithms," Tech. Rep., 2007.
F. Maes, L. Wehenkel, and D. Ernst, "Automatic discovery of ranking formulas for playing with multi-armed bandits," in Proc. 9th Eur. Workshop Reinforcement Learn., 2011, pp. 5-17.
D. Billings, A. Davidson, J. Schaeffer, and D. Szafron, "The challenge of poker," Artif. Intell., vol. 134, no. 1-2, pp. 201-240, 2002. (Pubitemid 34086584)
V. Nannen and A. Eiben, "Relevance estimation and value calibration of evolutionary algorithm parameters," in Proc. 20th Int. Joint Conf. Artif. Intell., 2007, pp. 975-980.

Name	Provider / Domaine	Expiration	Description
JSESSIONID	Oracle Corporation www.uliege.be	Session	General purpose platform session cookie, used by sites written in JSP. Usually used to maintain an anonymous user session by the server.
CookieScriptConsent	CookieScript .uliege.be	1 year	This cookie is used by Cookie-Script.com service to remember visitor cookie consent preferences. It is necessary for Cookie-Script.com cookie banner to work properly.

Name	Provider / Domaine	Expiration	Description
_pk_id	InnoCraft Ltd .uliege.be	1 year	Used to store a few details about the user such as the unique visitor ID
_pk_ses	InnoCraft Ltd .uliege.be	30 minutes	Short lived cookies used to temporarily store data for the visit
_pk_ref	InnoCraft Ltd .uliege.be	6 months	Used to store the attribution information, the referrer initially used to visit the website