DSC 243 – Advanced Optimization

Instructor: Dmitriy Drusvyatskiy

Lectures: MWF 10:00–10:50 AM in PCYNH 121

Office Hour: Monday 11:00 AM in HDSI 330

Table of contents

Course summary

This course develops the foundations of the modern optimization methods that form the backbone of machine learning and AI. Building on the prerequisite introductory course in optimization (DSC 211), it takes the convex quadratic optimization problem as its central object of study. Optimization of quadratics is the most fundamental problem in numerical optimization, yet many of the phenomena for this particular problem class have direct analogues for highly nonlinear and complex models (e.g. deep learning). Since the objective is quadratic, we can develop sharp intuition for the design, analysis, and practical behavior of algorithms using only basic linear-algebraic tools. We will study both worst-case convergence rates—depending only on extremal eigenvalues—and more refined guarantees that account for the interaction between initialization and the shape of the entire spectrum, covering gradient descent, Chebyshev acceleration, conjugate gradients, stochastic gradient methods, lower bounds, and the high-dimensional limits that connect these ideas to large-scale machine learning. The emphasis throughout is on gaining both theoretical insight and algorithmic intuition—understanding why methods work, how to choose between them, and how they scale to real-world settings.

Pre-requisites

Notes

The notes for the course are being developed. Here is the latest draft: notes.

Course outline

Week(s)TopicsNotes
1–3
  • Gradient descent
  • Acceleration by Chebyshev stepsizes
  • Conjugate Gradient method
Lecture notes
4–6
  • Improved rates under source and spectral conditions (e.g. Kernel regression)
  • Average case analysis (Marchenko Pastur Law)
  • Stochastic gradient method
  • Acceleration of SGD by averaging
Lecture notes
7-8
  • Lower bound for deterministic algorithms
  • Lower bounds for deterministic algorithms with a prescribed spectral measure
  • Lower bounds for stochastic optimization
Lecture notes
9-10
  • High-dimensional limit of SGD and consequences
Lecture notes

Grading policy

All grades will be based on homework sets, which will be due roughly every two weeks.

Canvas

I will use Canvas only for communicating by email with the class. All course material will appear on this webpage.

Textbooks and references

This course is largely self-contained with no required textbook. Specific results are cited inline in the lecture notes; the sources below are background and follow-up reading, grouped by the topics they support.

First-order and stochastic optimization (gradient descent, acceleration, SGD).

Krylov subspace methods and conjugate gradients.

Spectral structure, inverse problems, and random matrices.

High-dimensional limits of SGD.