Abstract :
[en] Reinforcement learning (RL) is a widely used learning paradigm for adaptive agents. Because exact RL can only be applied to very simple problems, approximate algorithms are usually necessary in practice. Many algorithms for approximate RL rely on basis-function representations of the value function (or of the Q-function). Designing a good set of basis functions without any prior knowledge of the value function (or of the Q-function) can be a difficult task. In this paper, we propose instead a technique to optimize the shape of a constant number of basis functions for the approximate, fuzzy Q-iteration algorithm. In contrast to other approaches to adapt basis functions for RL, our optimization criterion measures the actual performance of the computed policies in the task, using simulation from a representative set of initial states. A complete algorithm, using cross-entropy optimization of triangular fuzzy membership functions, is given and applied to the car-on-the-hill example.
Scopus citations®
without self-citations
4