DSC 243 – Advanced Optimization
Instructor: Dmitriy Drusvyatskiy
Lectures: MWF 10:00–10:50 AM in PCYNH 121
Office Hour: Monday 11:00 AM in HDSI 330
Table of contents
Course summary
This course develops the foundations of the modern optimization methods that form the backbone of machine learning and AI. Building on the prerequisite introductory course in optimization (DSC 211), this course focuses on the design, analysis, and practical behavior of algorithms for large-scale, high-dimensional problems. The emphasis is on gaining both theoretical insight and algorithmic intuition—understanding why methods work, how to choose between them, and how they scale to real-world settings. By the end of the course, students will be equipped to think critically about optimization in contemporary applications, from training machine learning models to solving complex decision and inference problems.
Pre-requisites
- Introductory course on optimization (DSC 211 or equivalent)
- Comfort with advanced linear algebra and multivariate calculus
Course outline
| Week(s) | Topics | Lecture notes and HW |
| 1 | - Gradient descent
- Acceleration by Chebyshev stepsizes
- Conjugate Gradient method
| Week 1 Notes HW1 |
| 2–3 | - Proximal gradient descent and accelerated gradient descent for convex optimization
- Lower complexity bounds for convex optimization
- Mirror descent and accelerated mirror descent
- Optimization of relatively smooth functions
| |
| 4–5 | - Mirror descent for nonsmooth convex functions
- Frank-Wolfe algorithm as subgradient method in the dual
| |
| 7 | - Smoothing algorithms
- Monotone operators and variational inequalities
- Extragradient and Chambolle-Pock algorithms
| |
| 8 | - Stochastic gradient method
- Coordinate descent
- Variance reduction: SVRG, SPIDER, STORM
- Adaptive algorithms: ADAGRAD, ADAM, RMSPROP
| |
| 9–10 | - SGD and STORM for nonconvex stochastic optimization
- Polyak–Łojasiewicz condition and convergence of gradient systems
- Examples in deep learning, sampling, control, and reinforcement learning
| |
Grading policy
All grades will be based on homework sets, which will be due roughly every two weeks.
Canvas
I will use Canvas only for communicating by email with the class. All course material will appear on this webpage.
Textbooks and references
This course will largely be self-contained with no required textbook. Relevant sources and follow-up reading material will be cited during the lectures. In particular, I will draw on some material in these sources:
- A. Beck. Introduction to Nonlinear Optimization – Theory, Algorithms and Applications. SIAM, 2014.
- S. Bubeck. Convex Optimization: Algorithms and Complexity.
- S. Boyd and L. Vandenberghe. Convex Optimization.
- D. Drusvyatskiy. Convex Analysis and Nonsmooth Optimization.
- J. Duchi. Introductory Lectures on Stochastic Convex Optimization.
- Y. T. Lee and S. Vempala. Techniques in Optimization and Sampling.
- G. Lan. First-order and Stochastic Optimization Methods for Machine Learning. Springer, 2020.
- Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer, 2004.
- B. Recht and S. J. Wright. Optimization for Modern Data Analysis. Cambridge University Press, 2022.
- A. Sidford. Optimization Algorithms.