STATS 606

Subgradient properties proofs

This post accompanies the slides on subgradients. Please see the slides for the problem setup and definitions.

In this post, \(x\in\reals^n\) is always a point in the interior of \(\dom f\). We start by recalling the epigraphical interpretation of subgradients.

Obs: $g\in\partial f(x)$ iff $(g,-1)\in\reals^n\times\reals$ defines a supporting hyperplane to $\epi(f)$ at $(x,f(x))$; i.e. \(\begin{bmatrix}g \\\\ -1\end{bmatrix}^\top\left(\begin{bmatrix}y \\\\ t\end{bmatrix} - \begin{bmatrix}x\\\\f(x)\end{bmatrix}\right)\le 0\text{ for all }(y,t)\in\epi(f).\)

Thus \(\partial (x)\) is non-empty as long as $\epi(f)$ has supporting hyperplanes at $(x,f(x))$.

Directional derivatives of convex functions

Recall the directional derivative of a function \(f:\reals^n\to\reals\) at \(x\in\dom f\) in the direction \(d\ne 0\) is

\[f'(x;d)\triangleq\lim_{\eta\searrow 0}\frac1\eta(f(x+\eta d) - f(x)).\]

If the limit is not well-defined, then the directional derivative (of \(f\) at \(x\) in the direction \(d\)) does not exist, and \(f\) is not directionally differentiable at \(x\) in the direction \(d\).

If \(f\) is differentiable, then Taylor’s theorem implies

\[\begin{aligned} f'(x;d) &= \lim_{\eta\searrow 0}\frac1\eta(f(x+\eta d) - f(x)) \\ &= \lim_{\eta\searrow 0}\frac1\eta(\eta\nabla f(x)^\top d + o(\eta)) \\ &= \lim\nabla f(x)^\top d + o(1). \end{aligned}\]

If \(f\) is convex, then the map \(\eta\to\frac1\eta(f(x+\eta d) - f(x))\) is non-increasing; i.e. \(\frac1\eta_1(f(x+\eta_1d) - f(x))\le\frac1\eta_1(f(x+\eta_2d) - f(x)).\)

(check this)

This implies the directional derivative of a convex function is equivalently \(f'(x;d)\triangleq\inf_{\eta > 0}\frac1\eta(f(x+\eta d) - f(x)).\)

Thm: \(f'(x;d) = \sup_{g\in\partial f(x)} g^\top d\).

It is not hard to show that \(f'(x;d) \ge g^\top d\) for any \(g\in\partial f(x)\). Indeed,

\[\begin{aligned} f'(x;d) &= \lim_{\eta\searrow 0}\frac1\eta(f(x+\eta d) - f(x)) \\ &\ge \lim_{\eta\searrow 0}\frac1\eta g^\top(\eta d) \\ &= g^\top d. \end{aligned}\]

The hard part is showing the sup is attained. This part relies on the the observation that \(f'(x;d)\) is actually convex WRT \(d\). Indeed,

\[\begin{aligned} &f'(x;\alpha d_1 + (1-\alpha)d_2) \\ &\quad= \lim_{\eta\searrow 0}\frac1\eta(f(x + \alpha\eta d_1 + (1-\alpha)\eta d_2) - f(x)) \\ &\quad= \lim_{\eta\searrow 0}\frac1\eta(f(\alpha(x + \eta d_1) + (1-\alpha)(x + \eta d_2)) - \eta f(x) - (1-\eta)f(x)) \\ &\quad\le \lim_{\eta\searrow 0}\frac1\eta(\alpha f(x + \eta d_1) + (1-\alpha)f(x + \eta d_2) - \eta f(x) - (1-\eta)f(x)) \\ &\quad\le \alpha\lim_{\eta\searrow 0}\frac1\eta(\alpha f(x + \eta d_1) - f(x)) + (1-\alpha)\lim_{\eta\searrow 0}\frac1\eta((1-\alpha)f(x + \eta d_2) - (1-\eta)f(x)) \\ &\quad= \alpha f'(x;d_1) + (1-\alpha)f'(x;d_2). \end{aligned}\]

Since \(f'(x;d)\) is convex WRT \(d\), it has a subgradient \(g\) at \(d\):

\[\lambda f'(x;y) = f'(x;\lambda y) \ge f(x;d) + g^\top(\lambda y - d).\]

If we let \(\lambda\nearrow 0\), we obtain \(f'(x;y) \ge g^\top y\). This implies

\[f(x+y) - f(x) \ge \inf_{\eta > 0}\frac1\eta(f(x + \eta y) - f(x)) = f'(x;y) \ge g^\top y,\]

so \(g\) is also a subgradient of \(f\) at \(x\). On the other hand, if we let \(\lambda\searrow 0\), we obtain \(f(x;d) \ge g^\top d\). Thus the sup is attained.

Posted on March 14, 2023 from Ann Arbor, MI.