reinforcement learning; large language models; RL; LLMs
Abstract :
[en] This course is the 8th part of the 2024 INFO8003-1 Optimal decision making for complex problems. It briefly reminds the reader about key concepts in reinforcement learning and then explains how to apply them to improve language models in various capacities. In particular, it covers the use of reward models and direct preference optimisation in conjunction with preference data.
Research Center/Unit :
Montefiore Institute - Montefiore Institute of Electrical Engineering and Computer Science - ULiège
Disciplines :
Computer science
Author, co-author :
Pirenne, Lize ; Université de Liège - ULiège > Département d'électricité, électronique et informatique (Institut Montefiore) > Smart grids
Language :
English
Title :
Reinforcement Learning and Large Language Models
Publication date :
2024
Number of pages :
40
Course title or code :
INFO8003-1 Optimal decision making for complex problems
Institution :
ULiège - Université de Liège [School of Engineering], Liège, Belgium