Reinforcement Learning (3 ECTS)

Description

Reinforcement learning is an area of machine learning in which an agent interacts repeatedly with an environment in order to maximize their cumulative reward. Compared to the classical supervised or unsupervised learning frameworks, here we are typically interested in problem in which an agent takes decision and learn at the same time, a paradigm that is also known as online learning (in which a typical tradeoff is the exploration versus exploitation dilemma). The application of reinforcement learning spans many areas of artificial intelligence. For instance, driving a car or designing a computer that plays the game of go can be achieved by reinforcement learning techniques.

The goal of this course will be to provide an overview of the main tools used to apprehends these problems. This course will have a strong theoretical component. We will cover the basics of online optimization (multi-armed bandits algorithms, regret minimization), and of Markov decision processes, Bellman’s optimality principle and basic learning algorithms for Markov processes. Throughout the course, we will focus on the mathematical and the algorithmic aspects of the theory. We will present implementation tutorials for the course’s algorithmic content.

Program

Part I : Online Optimization

In this part, we will introduce the concept of online learning algorithms. We will define the notion of regret -- which is central to online learning theory -- and explain how to construct low-regret algorithms. Notions:

The multi-armed bandit framework.
Regret minimization.
Upper and lower regret bounds and how to achieve them.

Part II : Markov decision processes and Reinforcement Learning

Reinforcement learning is classical framed in the context of Markov decision processes. In this part, we will define what is a Markov decision process and how this can be used to construct powerful control algorithms. Notions:

Markov decision processes and Bellman's optimality principle.
Model-free reinforcement learning algorithms (Q-learning, TD-learning, Deep queue learning).
Model-based reinforcement learning (UCRL2, PSRL).

Evaluation