Reinforcement learning (RL) has been in the limelight with many recent breakthroughs in artificial intelligence, yet theoretical understanding of many classical RL algorithms and their modern deep RL counterparts remain quite limited. Recently, fundamental concepts from optimization and dynamical systems theory have provided fresh perspectives to understanding their convergence behaviors as well as developing provably efficient RL algorithms.
This short course will introduce some recent theoretical and algorithmic developments in RL from the optimization and dynamical systems perspectives. Tentative topics include
- Overview of RL basics (dynamic programming, TD-learning, Q-learning, policy gradient, etc.)
- Unified O.D.E. analysis of RL algorithms (TD-learning and Q-learning variants)
- Stochastic optimization frameworks for RL
- Primal-dual optimization frameworks for RL
- Optimization for RL from a single agent to cooperative agents
- Extension to the theory of deep RL
- Open problems
Papers for the workshops (click to access):
- Junchi Yang, Siqi Zhang, Negar Kiyavash, Niao He. "A catalyst framework for minimax optimization." Advances in Neural Information Processing Systems, 2020. https://proceedings.neurips.cc/paper/2020/file/3db54f5573cd617a0112d35dd1e6b1ef-Paper.pdf
- Donghwan Lee and Niao He. "Target-Based Temporal Difference Learning." International Conference on Machine Learning, PMLR 97:3713-3722, 2019. http://proceedings.mlr.press/v97/lee19a/lee19a.pdf
- Yifan Hu, Xin Chen, and Niao He. "Sample complexity of sample average approximation for conditional stochastic optimization." SIAM Journal on Optimization 30, no. 3: 2103-2133, 2020. https://epubs.siam.org/doi/pdf/10.1137/19M1284865
- Semih Cayci, Niao He, R Srikant. "Linear Convergence of Entropy-Regularized Natural Policy Gradient with Linear Function Approximation." arXiv preprint arXiv:2106.04096, 2021 https://arxiv.org/pdf/2106.04096.pdf
In this course we present the latest achievements in the theory of high-order schemes. We start from the global complexity estimates for the second-order methods, based on then cubic regularization of the quadratic model of the objective function. We consider several complexity bounds for different classes of nonconvex functions. Next topic is the accelerated second-order methods for convex functions and the lower complexity bounds. After that, we pass to the universal second order schemes. And we finish the course with implementable tensor methods.
Papers for the workshops (click to access):
- Yu. Nesterov, B. Polyak. Cubic regularization of Newton's method and its global performance. Mathematical Programming, 108(1), 177-205 (2006). https://link.springer.com/article/10.1007/s10107-006-0706-8
- Yu. Nesterov. Accelerating the cubic regularization of Newton's method on convex problems. Mathematical Programming, 112(1) 159-181 (2008) https://link.springer.com/article/10.1007/s10107-006-0089-x
- H.Lu, R.Freund, and Yu.Nesterov. Relatively smooth convex optimization by first-order methods. SIOPT, 28(1), 333-354 (2018). https://epubs.siam.org/doi/abs/10.1137/16M1099546
- Yu. Nesterov. Implementable tensor methods in unconstrained convex optimization. Mathematical Programming, DOI 10.1007/s10107-019-01449-1, 186: 157-183 (2021) https://link.springer.com/article/10.1007/s10107-019-01449-1
- On Sunday, a welcome cocktail is organized at 18:30 in the main restaurant of the hotel Europe.
- Two 90 minute lectures are organized every morning.
- On Monday and Tuesday, workshops are organized to work on papers of the two lecturers. See details here.
- On Wednesday and Thursday, the groups present the papers prepared during the workshops.
- On Friday, we adjourn the winter school at 12:00.
- On Thursday evening, the workshop dinner is organized at 19:30.
- 25 minutes presentation,
- 15 minutes questions and discussions.
- Wednesday, 17:00-17:40
-
Group Y1:
- Jusup Matej (ETH Zurich),
- Qorbanian Roozbeh (University of Luxembourg),
- Li Mengmeng (EPFL, RAO),
- Kang Zhao (TU Eindhoven),
- Schnidrig Jonas (EPFL, IPESE).
- Yu. Nesterov, B. Polyak. Cubic regularization of Newton's method and its global performance. Mathematical Programming, 108(1), 177-205 (2006). https://link.springer.com/article/10.1007/s10107-006-0706-8
- Wednesday, 17:40-18:20
- Group N1:
- Kamoutsi Angeliki (ETH Zuerich),
- Abdolhamidi Dorsa (EPFL, OES),
- Krawczuk Igor (EPFL, LIONS),
- Gajda Mikele (University of Lausanne),
- Behmandpoor Pourya (KU Leuven).
- Junchi Yang, Siqi Zhang, Negar Kiyavash, Niao He. "A catalyst framework for minimax optimization." Advances in Neural Information Processing Systems, 2020. https://proceedings.neurips.cc/paper/2020/file/3db54f5573cd617a0112d35dd1e6b1ef-Paper.pdf
- Thursday, 17:00-17:40
- Group Y2:
- Rychener Yves (EPFL, RAO),
- Tschernutter Daniel (ETH Zurich),
- Han Jun (EPFL, OES),
- Gender Alexander (ETH Zurich).
- Yu. Nesterov. Accelerating the cubic regularization of Newton's method on convex problems. Mathematical Programming, 112(1) 159-181 (2008) https://link.springer.com/article/10.1007/s10107-006-0089-x
- Thursday, 17:40-18:20
- Group N2:
- Taskesen Bahar (EPFL, RAO),
- Schneider Philipp (EPFL, OES),
- Demiray Onur (EPFL, TOM),
- Lepour Dorsan (EPFL, IPESE).
- Donghwan Lee and Niao He. "Target-Based Temporal Difference Learning." International Conference on Machine Learning, PMLR 97:3713-3722, 2019. http://proceedings.mlr.press/v97/lee19a/lee19a.pdf
Monday | Tuesday | Wednesday | Thursday | Friday | |
---|---|---|---|---|---|
08:30-10:00 | Yurii | Niao | Yurii | Niao | Yurii |
10:00-10:30 | Coffee break | ||||
10:30-12:00 | Niao | Yurii | Niao | Yurii | Niao |
12:00-17:00 | Sport and discussions | ||||
17:00-19:00 | Workshops | Students presentations | |||
19:30 | Dinner |
Each presentation has a slot of 40 minutes, distributed as follows: