Winter School on Optimization and Operations Research

Prof. Dolores Romero Morales Prof. Simon Lacoste-Julien

Mathematical Optimization and Data Science

Recent advances in large scale convex optimization and applications to machine learning

Prof. Dolores Romero Morales

Prof. Simon Lacoste-Julien

Data Science aims to develop tools to represent and extract knowledge from complex data to aid Decision Making. Mathematical Optimization has played a crucial role across the three main pillars of Data Science, namely Supervised Learning, Unsupervised Learning and Information Visualization. For instance, Quadratic Programming is used in Support Vector Machines, a Supervised Learning tool. Mixed-Integer Programming is used in Clustering, an Unsupervised Learning task. Global Optimization is used in MultiDimensional Scaling, an Information Visualization tool. The increasing complexity of the data and the sophistication of the questions posed by users calls for innovative mathematical optimization modeling and cutting-edge numerical optimization methods. During these five lectures, we will navigate through some of the latest developments of Mathematical Optimization in Data Science, with a special focus on classification and information visualization.

The first lecture will introduce the main challenges in Data Science and the implications for the Mathematical Optimization community.

During the second and the third lectures, we will address the problem of classifying individuals into a number of classes. The individuals have attached a set of features. We aim to build Support Vector Machine-type classification models, which perform well in terms of accuracy as well as other relevant criteria to the user such as interpretability and costs.

During the last two lectures, we will address the problem of visualizing a set of individuals. The individuals have attached complex data. We aim to build visualization frameworks, which faithfully represent the data by means of (dynamic) plots.

Machine learning has become a popular application domain for modern optimization techniques, pushing the frontier of optimization algorithms. The need for large scale optimization algorithms which can handle millions of dimensions or data points, typical for the big data era, have brought a resurgence of interest for first order algorithms, making us revisit the venerable stochastic gradient method [Robbins-Monro 1951] as well as the Frank-Wolfe algorithm [Frank-Wolfe 1956], and yielding recent improvements on them. In this short course, we will survey these recent developments, and motivate them from concrete machine learning problems, highlighting the interesting interplay between machine learning and optimization.

We will start by presenting the standard machine learning optimization framework (for supervised learning) coming from regularized empirical risk minimization, and review standard convex optimization results. We will then present incremental gradient methods and block-coordinate approaches, to handle the large scale datasets. We will present the recent development of variance reduction for incremental gradient methods, which yields the state-of-the-art optimization algorithm for several machine learning problems.

We will then review the Frank-Wolfe algorithm and explain its usefulness to efficiently handle the combinatorial structure present in many machine learning problems. We will finally present some of its recent extensions, both algorithmic and theoretical, motivated by the "structured prediction" problem from machine learning that yields a challenging optimization structure.

Important dates

December 16, 2017:

January 12, 2018:

January 14-19, 2018:
deadline for early registration
deadline for late registration



Mila Bender

Transport and Mobility Laboratory (TRANSP-OR)
Station 18
CH-1015 Lausanne