STATS 606

Optimality conditions

Let \(\cC\subset\reals^d\) be closed, and \(f_0:\reals^d\to\reals\) be a continuously differentiable function. Consider the (possibly non-convex) general optimization problem

\[\begin{aligned} &\min\nolimits_{x\in\reals^d} & & f_0(x) \\ &\subjectto & & x\in\cC \end{aligned}.\]

Intuitively, if a point \(x_*\) is a (local) minimum, then

\[\langle\partial f_0(x_*),v\rangle \ge 0\text{ in all "feasible directions" }v\]

If \(x_*\) is in the interior of \(\cC\), then the set of “feasible directions” is \(\reals^d\), and we recover the usual zero-gradient (necessary) optimality condition:

\[\langle\partial f_0(x_*),v\rangle \ge 0\text{ for all }v\in\reals^d\iff\partial f_0(x_*) = 0.\]

If \(x_*\) is on the boundary of \(\cC\), then the set of “feasible directions” is intuitively the set of all directions that point “inward” from \(x_*\). As we shall see, this is the set of tangent vectors.

Tangent vectors

At first blush, we consider defining tangent vectors at \(x\in\cC\) as directions \(v\) such that \(x + \eta d\in\cC\) for all small enough step sizes \(\eta\). Although this definition works well for $\cC$ defined by linear constraints, it is too restrictive for $\cC$ with curved boundaries. For example, if \(\cC\) is curve in \(\reals^2\), then it may not be possible to move in any direction and remain in \(\cC\).

Tangent vectors: A vector \(v\in\reals^d\) is tangent to \(\cC\) at \(x\in\cC\) iff there are sequences \((x_n)\subset\cC\), \((x_n) \to x\) and \((\eta_n)\searrow 0\) such that

\(\frac{1}{\eta_n}(x_n - x) \to v\) or \(x_n - x = \eta_nv + o(\eta_n)\).

Intuitively, \((x_n)\) traces out a curve passing thru \(x\), and the line segment \(x + \eta v\) is tangent to this curve. Compared to the initial (overly restrictive) definition of tangent vector, this definition adds a \(o(\eta_n)\) fudge factor.

We note that any direction \(v\) such that \(x + \eta v\in\cC\) for all small enough \(\eta\) is a tangent vector. Indeed, let \(\eta_n\) be a sequence of small enough step sizes that converges to zero and \(x_n \triangleq x + \eta_nv\). The assumption \(x + \eta v\in\cC\) for all small enough \(\eta\) ensures \(x_n\) is in \(\cC\). We have \(x_n - x = \eta_nv\).

We also note that this definition of tangent vector is coincides with the tangent space of a smooth manifold. Recall the tangent space of a smooth manifold \(\cM\) at \(x\in\cM\) consists of the derivatives (at \(x\)) of all smooth curves in \(\cM\) passing thru \(x\). Let \(\gamma:[-\delta,\delta]\to\cC\) be a curve in \(\cC\) such that \(x = \gamma(0)\). To see that \(\dot{\gamma}(0)\) is a tangent vector, let \(\eta_n\searrow 0\) and \(x_n\triangleq\gamma(\eta_n)\). We have

\[x_n - x = \gamma(\eta_n) - \gamma(0) = \eta_n\dot{\gamma}(0) + o(\eta_n),\]

where we used the definition of the derivative in the second step.

Finally, we claim that the set of all tangent vectors at \(x\in\cC\) is a closed cone called the tangent cone \(T_{\cC}(x)\). Recall a subset of \(\reals^d\) is a cone iff it is closed under non-negative scalar multiplication: if \(x\in\cC\), then \(\alpha x\in\cC\) for any \(\alpha \ge 0\). The proof of this claim is elementary, and we leave the details as an exercise to the reader.

Optimality conditions for the general optimization problem

The normal vectors at a point \(x\in\cC\) are the vectors that make an obtuse angle with all tangent vectors: \(\langle u,v\rangle \le 0\) for all \(v\in T_{\cC}(x)\). It is not hard to check that the set of all normal vectors at a point is a closed convex cone. We note that the tangent cone of a non-convex set may not the convex, but the normal cone is generally convex.

Optimality conditions for the general optimization problem: If \(x_*\) is a local minimum, then \(-\partial f_0(x_*)\in N_{\cC}(x_*)\). This is equivalent to

\[\langle\partial f_0(x_*),v\rangle \ge 0\text{ for all }v\in T_{\cC}(x_*).\]

To see this, let \(v\in T_{\cC}(x_*)\) be an arbitrary tangent vector. There are \((x_n)\in\cC\), \(x_n\to x\) and \(\eta_n\searrow 0\) such that \(\frac{1}{\eta_n}(x_n - x_*) \to v\). We have

\[\begin{aligned} 0 &\le f_0(x_n) - f_0(x_*) \\ &= f_0(x_* + \eta_n v) - f_0(x_*) + O(\|x_n - (x_* + \eta_nv)\|) \\ &= f_0(x_* + \eta_n v) - f_0(x_*) + o(\eta_n), \end{aligned}\]

where the first step is the (local) optimality of \(x_*\), the second step is the smoothness of continuously differentiable functions, and the third step is the definition of tangent vector. We divide both sides by \(\eta_n\) and take limits to obtain

\[0 \le \frac{1}{\eta_n}(f_0(x_* + \eta_n v) - f_0(x_*)) \to \langle\partial f_0(x_*),v\rangle.\]

This optimality condition is intuitive: if \(x_*\) is a local minimum, then there is no direction \(v\) that is (i) tangent to the feasible set and (ii) a (strict) descent direction.

Optimality conditions for convex problems

The main results here are

  1. \(\def\cone{\mathop{\rm cone}}T_{\cC}(x) = \cone(\cC - x_*)\), where \(\cone\) is the conic hull;
  2. \(N_{\cC}(x) = \{u\in\reals^d\mid\langle u,x'-x\rangle \le 0\text{ for all }x'\in\cC\}\);
  3. The optimality condition \(-\partial f_0(x_*)\in N_{\cC}(x_*)\) is necessary and sufficient.

Posted on March 18, 2021 from San Francisco, CA.