This is the first post in a new series on Performance Estimation Problems (PEP). I’ve divided the series into two parts: the first introduces the PEP framework, and the second covers applications of PEP, including convergence proofs and stepsize optimization. Let’s start with the basics.
Introduction
The PEP framework was introduced in the seminal work by Drori and Teboulle:
[1] Drori, Yoel, and Marc Teboulle. ‘‘Performance of first-order methods for smooth convex minimization: a novel approach.”Mathematical Programming145.1 (2014): 451-482.
In a nutshell, PEP is an innovative analytical tool for assessing the convergence of first-order methods. Traditional convergence proofs often rely on single-step descent properties and telescoping techniques, which can lead to conservative stepsize choices (to ensure descent in each step). PEP, on the other hand, facilitates multi-step descent through computer-aided proofs, offering a potentially more flexible approach.
High-Level Idea
Firstly, to establish a convergence proof, we need the following components:
- Objective: worst case performance criteria
- Constraint 1: problem instance $\in$ some function class
- Constraint 2: the iterates are generated by the first-order algorithm
- Constraint 3: starting from a reasonable point
More formally, these components can be translated as:
- Performance measure $\mathcal{P}$: function value gap $f (x_N) - f (x^{\ast})$, residual subgradient norm $| g (x_N) |^2$, distance to optimality $| x_N - x^{\ast} |^2$.
- Function class $\mathcal{F}$: property of $f$, including convex, Lipschitz, smooth.
- Intialization condition $\mathcal{I}$: function value gap $f (x_0) - f (x^{\ast}) \leq R$, residual subgradient norm $| g (x_0) |^2 \leq R^2$, distance to optimality $| x_0 - x^{\ast} |^2 \leq R^2$.
- First-order method $\mathcal{M}$: only rely on $x_i$ and $f (x_i), g (x_i)$
Given these components, can we establish an optimization problem to automatically provide a convergence proof? This is precisely the role of Performance Estimation Problems (PEP).
PEP Formulation
We consider the problem of minimizing a function $f \in \mathcal{F}$, and we consider first-order setting, which means that we can only have access to the function values and the gradients. The first-order oracle is represented as ${ x_i, f (x_i), g (x_i) }$.
In the most general sense, PEP is formulated as
\[\begin{array}{rcl} (P1)\max_{x_0, {\color{red}{f}}} & \mathcal{P} (\{ x_i, f (x_i), g (x_i) \}) & \nonumber\\ \text{s.t.} & f \in \mathcal{F} & \label{pep}\\ & \mathcal{I} (x_0, f (x_0), g (x_0)) {\operatorname{holds}} & \nonumber\\ & x_i = \mathcal{M} (\{ x_j, f (x_j), g (x_j) \}) & \nonumber\\ & x^{\ast} \ {\operatorname{is}}\ a\ {\operatorname{minimizer}}\ {\operatorname{of}}\ f (x) & \nonumber \end{array}\]The challenge in solving (P1) lies in the infinite-dimensional constraint on the function class. However, we can convert this into a finite-dimensional problem by focusing solely on the outputs of the first-order oracle:
\[\begin{array}{rcl} (P2)\max_{\{ x_i, {\color{red}{f_i, g_i}} \}} & \mathcal{P} (\{ x_i, f_i, g_i \}) & \nonumber\\ \text{s.t.} & \exists f \in \mathcal{F}, f_i = f (x_i), g_i = g (x_i) & \label{fpep}\\ & \mathcal{I} (x_0, f_0, g_0) {\operatorname{holds}} & \nonumber\\ & x_i = \mathcal{M} (\{ x_j, f_j, g_j \}) & \nonumber\\ & x^{\ast} \ {\operatorname{is}}\ a\ {\operatorname{minimizer}}\ {\operatorname{of}}\ f (x) & \nonumber \end{array}\]We claim that (P1) and (P2) are equivalent. Specifically, every solution to (P2), ${ x_i, f (x_i), g (x_i) }$, corresponds to a function $F∈\mathcal F$, such that ${x_i,f}$ is feasible in (P1). Reciprocally, every solution to (P1) can be discretized to provide a solution for (P2).
Now, we instantiate problem (P2) by specific choices of $(\mathcal{P}, \mathcal{F}, \mathcal{I}, \mathcal{M})$:
\[\begin{array}{rcl} (P3)\max_{\{ x_i, {\color{red}{f_i, g_i}} \}} & f_N - f_{\star} & \nonumber\\ \text{s.t.} & \exists f \in \mathcal{F}, f_i = f (x_i), g_i = g (x_i) & \label{rpep}\\ & \| x_0 - x_{\star} \| \leq R & \nonumber\\ & x_i = x_{i - 1} - \frac{h}{L} f_{i - 1} & \nonumber\\ & g_{\star} = 0 & \nonumber\end{array}\]where $\mathcal{F}$ is convex, $L$-smooth, and $h$ is the given stepsize.
Interpolation Theory
So far, the transformations are equivalent, and it remains to show necessary and sufficient conditions for the constraint
\[\exists f \in \mathcal{F}, f_i = f (x_i), g_i = g (x_i),\]which is shown by convex interpolation condition:
[2] Taylor, Adrien B., Julien M. Hendrickx, and François Glineur. ‘‘Smooth strongly convex interpolation and exact worst-case performance of first-order methods.”Mathematical Programming161 (2017): 307-345.
We’ll begin with the simplest case: the interpolation conditions for a convex function class. The interpolation problem is: given first order observation ${ x_i, f_i, g_i }$, can we find a convex function $f$, such that
\[f_i = f (x_i), g_i = \nabla f (x_i)\]We can easily identify a necessary condition
\[f_i - f_j \geq \langle g_j, x_i - x_j \rangle, \forall i, j\]which comes from the property of convex functions. Actually, this condition is also sufficient, since we can construct a convex function that interpolates ${ x_i, f_i, g_i }$. Consider the following piecewiese linear function:
\[f (x) = \max_j \{ f_j + \langle g_j, x - x_j \rangle \}, {\operatorname{if}}\ x \in {\operatorname{conv}} (\{ x_i \})\]For convex and $L$-smooth function class, the interpolation conditions are
\[f_i \geq f_j + \langle g_j, x_i - x_j \rangle + \frac{1}{2 L} \| g_i - g_j \|_2^2, \forall i, j\]Equipped with these conditions, we can transform (P3) into an equivalent problem
\[\begin{array}{rcl} (P4)\max_{\{ x_i, f_i, g_i \}} & f_N - f_{\star} & \nonumber\\ \text{s.t.} & f_i \geq f_j + \langle g_j, x_i - x_j \rangle + \frac{1}{2 L} \| g_i - g_j \|_2^2 & \label{finalpep}\\ & \| x_0 - x_{\star} \|_2^2 \leq R^2 & \nonumber\\ & x_i = x_{i - 1} - \frac{h}{L} f_{i - 1} & \nonumber\\ & g_{\star} = 0 & \nonumber\end{array}\]Problem (P4) is now tractable, as it can be further reformulated into a semidefinite program (SDP). For more technical details, refer to [1].