Instructor: Dmitriy Drusvyatskiy
Lectures: MWF 10:00–10:50 AM in PCYNH 121
Office Hour: Monday 11:00 AM in HDSI 330
This course develops the foundations of the modern optimization methods that form the backbone of machine learning and AI. Building on the prerequisite introductory course in optimization (DSC 211), it takes the convex quadratic optimization problem as its central object of study. Optimization of quadratics is the most fundamental problem in numerical optimization, yet many of the phenomena for this particular problem class have direct analogues for highly nonlinear and complex models (e.g. deep learning). Since the objective is quadratic, we can develop sharp intuition for the design, analysis, and practical behavior of algorithms using only basic linear-algebraic tools. We will study both worst-case convergence rates—depending only on extremal eigenvalues—and more refined guarantees that account for the interaction between initialization and the shape of the entire spectrum, covering gradient descent, Chebyshev acceleration, conjugate gradients, stochastic gradient methods, lower bounds, and the high-dimensional limits that connect these ideas to large-scale machine learning. The emphasis throughout is on gaining both theoretical insight and algorithmic intuition—understanding why methods work, how to choose between them, and how they scale to real-world settings.
The notes for the course are being developed. Here is the latest draft: notes.
| Week(s) | Topics | Notes |
|---|---|---|
| 1–3 |
| Lecture notes |
| 4–6 |
| Lecture notes |
| 7-8 |
| Lecture notes |
| 9-10 |
| Lecture notes |
All grades will be based on homework sets, which will be due roughly every two weeks.
I will use Canvas only for communicating by email with the class. All course material will appear on this webpage.
This course is largely self-contained with no required textbook. Specific results are cited inline in the lecture notes; the sources below are background and follow-up reading, grouped by the topics they support.
First-order and stochastic optimization (gradient descent, acceleration, SGD).
Krylov subspace methods and conjugate gradients.
Spectral structure, inverse problems, and random matrices.
High-dimensional limits of SGD.