Optimality conditions

Let $\cC\subset\reals^d$ be closed, and $f_0:\reals^d\to\reals$ be a continuously differentiable function. Consider the (possibly non-convex) general optimization problem

\[\begin{aligned} &\min\nolimits_{x\in\reals^d} & & f_0(x) \\ &\subjectto & & x\in\cC \end{aligned}.\]

Intuitively, if a point $x_*$ is a (local) minimum, then

\[\langle\partial f_0(x_*),v\rangle \ge 0\text{ in all "feasible directions" }v\]

If $x_*$ is in the interior of $\cC$, then the set of “feasible directions” is $\reals^d$, and we recover the usual zero-gradient (necessary) optimality condition:

\[\langle\partial f_0(x_*),v\rangle \ge 0\text{ for all }v\in\reals^d\iff\partial f_0(x_*) = 0.\]

If $x_*$ is on the boundary of $\cC$, then the set of “feasible directions” is intuitively the set of all directions that point “inward” from $x_*$. As we shall see, this is the set of tangent vectors.

Tangent vectors

At first blush, we consider defining tangent vectors at $x\in\cC$ as directions $v$ such that $x + \eta d\in\cC$ for all small enough step sizes $\eta$. Although this definition works well for $\cC$ defined by linear constraints, it is too restrictive for $\cC$ with curved boundaries. For example, if $\cC$ is curve in $\reals^2$, then it may not be possible to move in any direction and remain in $\cC$.

Tangent vectors: A vector $v\in\reals^d$ is tangent to $\cC$ at $x\in\cC$ iff there are sequences $(x_n)\subset\cC$, $(x_n) \to x$ and $(\eta_n)\searrow 0$ such that

$\frac{1}{\eta_n}(x_n - x) \to v$ or $x_n - x = \eta_nv + o(\eta_n)$.

Intuitively, $(x_n)$ traces out a curve passing thru $x$, and the line segment $x + \eta v$ is tangent to this curve. Compared to the initial (overly restrictive) definition of tangent vector, this definition adds a $o(\eta_n)$ fudge factor.

We note that any direction $v$ such that $x + \eta v\in\cC$ for all small enough $\eta$ is a tangent vector. Indeed, let $\eta_n$ be a sequence of small enough step sizes that converges to zero and $x_n \triangleq x + \eta_nv$. The assumption $x + \eta v\in\cC$ for all small enough $\eta$ ensures $x_n$ is in $\cC$. We have $x_n - x = \eta_nv$.

We also note that this definition of tangent vector is coincides with the tangent space of a smooth manifold. Recall the tangent space of a smooth manifold $\cM$ at $x\in\cM$ consists of the derivatives (at $x$) of all smooth curves in $\cM$ passing thru $x$. Let $\gamma:[-\delta,\delta]\to\cC$ be a curve in $\cC$ such that $x = \gamma(0)$. To see that $\dot{\gamma}(0)$ is a tangent vector, let $\eta_n\searrow 0$ and $x_n\triangleq\gamma(\eta_n)$. We have

\[x_n - x = \gamma(\eta_n) - \gamma(0) = \eta_n\dot{\gamma}(0) + o(\eta_n),\]

where we used the definition of the derivative in the second step.

Finally, we claim that the set of all tangent vectors at $x\in\cC$ is a closed cone called the tangent cone $T_{\cC}(x)$. Recall a subset of $\reals^d$ is a cone iff it is closed under non-negative scalar multiplication: if $x\in\cC$, then $\alpha x\in\cC$ for any $\alpha \ge 0$. The proof of this claim is elementary, and we leave the details as an exercise to the reader.

Optimality conditions for the general optimization problem

The normal vectors at a point $x\in\cC$ are the vectors that make an obtuse angle with all tangent vectors: $\langle u,v\rangle \le 0$ for all $v\in T_{\cC}(x)$. It is not hard to check that the set of all normal vectors at a point is a closed convex cone. We note that the tangent cone of a non-convex set may not the convex, but the normal cone is generally convex.

Optimality conditions for the general optimization problem: If $x_*$ is a local minimum, then $-\partial f_0(x_*)\in N_{\cC}(x_*)$. This is equivalent to

\[\langle\partial f_0(x_*),v\rangle \ge 0\text{ for all }v\in T_{\cC}(x_*).\]

To see this, let $v\in T_{\cC}(x_*)$ be an arbitrary tangent vector. There are $(x_n)\in\cC$, $x_n\to x$ and $\eta_n\searrow 0$ such that $\frac{1}{\eta_n}(x_n - x_*) \to v$. We have

\[\begin{aligned} 0 &\le f_0(x_n) - f_0(x_*) \\ &= f_0(x_* + \eta_n v) - f_0(x_*) + O(\|x_n - (x_* + \eta_nv)\|) \\ &= f_0(x_* + \eta_n v) - f_0(x_*) + o(\eta_n), \end{aligned}\]

where the first step is the (local) optimality of $x_*$, the second step is the smoothness of continuously differentiable functions, and the third step is the definition of tangent vector. We divide both sides by $\eta_n$ and take limits to obtain

\[0 \le \frac{1}{\eta_n}(f_0(x_* + \eta_n v) - f_0(x_*)) \to \langle\partial f_0(x_*),v\rangle.\]

This optimality condition is intuitive: if $x_*$ is a local minimum, then there is no direction $v$ that is (i) tangent to the feasible set and (ii) a (strict) descent direction.

Optimality conditions for convex problems

The main results here are

$\def\cone{\mathop{\rm cone}}T_{\cC}(x) = \cone(\cC - x_*)$, where $\cone$ is the conic hull;
$N_{\cC}(x) = \{u\in\reals^d\mid\langle u,x'-x\rangle \le 0\text{ for all }x'\in\cC\}$;
The optimality condition $-\partial f_0(x_*)\in N_{\cC}(x_*)$ is necessary and sufficient.

Posted on March 18, 2021 from San Francisco, CA.