STATS 606

The Karush-Kuhn-Tucker (KKT) conditions

At a high-level, the KKT conditions are an instance of the (necessary) optimality condition for the general optimization problem to non-linear optimization problems in standard form. To keep things simple, we consider a non-linear optimization with only inequality constraints:

\[\begin{aligned} & \min\nolimits_{x\in\reals^d} && f_0(x) \\ & \subjectto && \{f_i(x) \le 0\}_{i=1}^m \end{aligned}.\]

Characterizing the tangent vectors

Define \(\cC\) as the feasible set of the preceding optimization problem:

\[\cC \triangleq \{x\in\reals^d\mid\{f_i(x) \le 0\}_{i=1}^m\}.\]

At first blush, we guess that the tangent cone at \(x\in\cC\) is

\[T_{\cC}(x) \overset{?}{=} \{v\in\reals^d:\{\langle\partial f_i(x),v\rangle\le 0\}_{i\in\cA}\},\]

where \(\cA \triangleq \{i\in[m]\mid f_i(x) = 0\}\) is the set of indicies of the active constraints at \(x\). This guess is motivated by the observation that we must have \(\langle\partial f_i(x),v\rangle \le 0\rangle\) for all \(i\in\cA\) or moving in the direction \(v\) will violate an active inequality constraint. Unforunately, this guess is not restrictive enough: there are pathological cases in which there are \(v\)’s that satisfy the preceding guess, but are not tangent vectors.

Consider the set \(\cC \triangleq \{x\in\reals^2\mid\frac12\|x + e_1\|_2^2 \le \frac12, \frac12\|x - e_1\|_2^2 \le \frac12\}\), where \(e_1 \triangleq (1,0)\). This is the intersection of two disks of radius 1: one centered at \(e_1\) and another centered at \(-e_1\). The two disks only intersect at the origin. Thus \(\cC = \{0\}\), and \(T_{\cC}(0) = \{0\}\). On the other hand, the preceding guess is

\[T_{\cC}(x) \overset{?}{=} \{v\in\reals^2\mid\langle e_1,v\rangle = 0\}.\]

To rule out such pathological cases, we impose Slater’s constraint qualification (CQ).

Let \(L_{\cC}(x)$ \triangleq \{v\in\reals^d:\{\langle\partial f_i(x),v\rangle\le 0\}_{i\in\cA}\}\). It is not hard to check that \(T_{\cC}(x) \subset L_{\cC}(x)\). We leave the details as an exercise and focus on establishing

To see that Slater’s CQ

The KKT conditions

Recall the optimality condition of the general optimization problem: if \(x_*\) is a local minimum, then \(\langle\partial f_0(x_*),v\rangle \ge 0\) for all \(v\in T_{\cC}(x_*)\). In light of the preceding characterization of \(T_{\cC}(x_*)\), this is equivalent to there is no \(v\in\reals^d\) such that

\[\begin{aligned} \{\langle\partial f_i(x_*),v\rangle &\le 0\}_{i\in\cA}, \\ -\langle\partial f_0(x_*),v\rangle &< 0. \end{aligned}\]

Consider the \(v\) in the preceding system of inequalities as (the normal vector) of a hyperplane thru the origin. The optimality condition has a geometric interpretation: there is no hyperplane (thru the origin) that separates \(-\partial f_0(x_*)\) from \(\{\partial f_i(x_*)\}_{i\in\cA}\). This implies \(\partial f_0(x_*)\) is in the conic hull of \(\{\partial f_i(x_*)\}_{i\in\cA}\): i.e. there is \(\lambda\in\reals_+^m\) such that

\[\textstyle-\partial f_0(x_*) = \sum_{i\in\cA}\lambda_i\partial f_i(x_*).\]

This is almost the stationarity condition in the KKT conditions. We pad \(\lambda\) with zero entries for the constraints that are inactive to get

\[\textstyle-\partial f_0(x_*) = \sum_{i=1}^m\lambda_i\partial f_i(x_*)\]

to get the stationarity condition. The complementary slackness and dual feasibility conditions follows from the construction of \(\lambda\). Finally, the primal feasibility condition is a consequence of the fact that \(x_*\) is a local minimum (hence it’s feasible).

Posted on March 18, 2021 from San Francisco, CA.