Mathematical Foundations of Machine Learning (6 ECTS)

Description and objective

Machine Learning is one of the key areas of Artificial Intelligence and it concerns the study and the development of quantitative models that enables a computer to perform tasks without being explicitly programmed to do them. Learning in this context is hence to recognize complex forms and to make intelligent decisions. Given all existing entries, the difficulty of this task lies in the fact that all possible decisions is usually very complex to enumerate. To get around that, machine learning algorithms are designed in order to gain knowledge on the problem to be addressed based on a limited set of observed data extracted from this problem. To illustrate this principle, consider the supervised learning task, where the prediction function, which infers a predicted output for a given input, is learned over a finite set of labeled training examples, where each instance of this set is a pair constituted of a vector characterizing an observation in a given vector space, and an associated desired response for that instance (also called desired output). After the training step, the function returned by the algorithm is sought to give predictions on new examples, which have not been used in the learning process, with the lowest probability of error. The underlying assumption in this case is that the examples are, in general, representative of the prediction problem on which the function will be applied. We expect that the learning algorithm produces a function that will have a good generalization performance and not the one that is able to perfectly reproduce the outputs associated to the training examples. Guarantees of learnability of this process were studied in the theory of machine learning largely initiated by Vladimir Vapnik. These guarantees are dependent on the size of the training set and the complexity of the class of functions where the algorithm searches for the prediction function. Emerging technologies, particularly those related to the development of Internet, reshaped the domain of machine learning with new learning frameworks that have been studied to better tackle the related problems. One of these frameworks concerns the problem of learning with partially labeled data, or semi-supervised learning, which development is motivated by the effort that has to be made to construct labeled training sets for some problems, while large amount of unlabeled data can be gathered easily for these problems. The inherent assumption, in this case, is that unlabeled data contain relevant information about the task that has to be solved, and that it is a natural idea to try to extract this information so as to provide the learning algorithm more evidence. From these facts were born a number of works that intended to use a small amount of labeled data simultaneously with a large amount of unlabeled data to learn a prediction function.

The intent of this course is to propose a broad introduction to the field of Machine Learning, including discussions of each of the major frameworks, supervised, unsupervised, semi-supervised and reinforcement learning.

Program

Part I. Supervised, unsupervised and semi-supervised Learning

This part gives an overview of foundations of supervised learning. We will see that learning is an inductive process where a general rule is to be found from a finite set of labeled observations by minimizing the empirical risk of the rule over that set. The study of consistency gives conditions that, in the limit of infinite sample sizes, the minimizer of the empirical risk will lead to a value of the risk that is as good as the best attainable risk. The direct minimization of the empirical risk is not tractable as the latter is not derivative, hence learning algorithms find the parameters of the learning rule by minimizing a convex upper-bound (or surrogate) of the empirical risk. We present, classical strategies for unconstrained convex optimization: gradient descente, Quasi-Newton approach, and conjugate gradient descente. We present classical learning algorithms for binary classification: the perceptron, logistic regression and boosting by linking the development of these models to the Empirical Risk Minimization framework as well as the Multi-class classification paradigm. Particularly, we present Multi-Layer Perceptron as well as the back-propagation algorithm that is in use in deep learning. For unsupervised and semi-supervised learning, we will present generative models for clustering as well as two powerful tools for parameter estimation namely Expectation-Maximization (EM) and Classification Expectation-Maximization (CEM) algorithms. In the context of Big Data, labeling observations for learning is a tedious task. Semiu-supervised paradigm aims at learining with few labeled and a huge amount of unlabeled data. In this part we review the three families of techniques proposed in semi-supervised learning, that is Graphical, Generative and Discriminant models.

More information on this part: here.

Part II. Reinforcement Learning

In this part, we will introduce the concept of online learning algorithms and the context of Markov decision processes. For the former, We will define the notion of regret -- which is central to online learning theory -- and explain how to construct low-regret algorithms. Notions:

The multi-armed bandit framework.
Regret minimization.
Upper and lower regret bounds and how to achieve them.

For the latter, we will define what is a Markov decision process and how this can be used to construct powerful control algorithms. Notions:

Markov decision processes and Bellman's optimality principle.
Model-free reinforcement learning algorithms (Q-learning, TD-learning, Deep queue learning).
Model-based reinforcement learning (UCRL2, PSRL).

Evaluation