Subgradient properties proofs

This post accompanies the slides on subgradients. Please see the slides for the problem setup and definitions.

In this post, $x\in\reals^n$ is always a point in the interior of $\dom f$. We start by recalling the epigraphical interpretation of subgradients.

Obs: $g\in\partial f(x)$ iff $(g,-1)\in\reals^n\times\reals$ defines a supporting hyperplane to $\epi(f)$ at $(x,f(x))$; i.e. $\begin{bmatrix}g \\\\ -1\end{bmatrix}^\top\left(\begin{bmatrix}y \\\\ t\end{bmatrix} - \begin{bmatrix}x\\\\f(x)\end{bmatrix}\right)\le 0\text{ for all }(y,t)\in\epi(f).$

Thus $\partial (x)$ is non-empty as long as $\epi(f)$ has supporting hyperplanes at $(x,f(x))$.

Directional derivatives of convex functions

Recall the directional derivative of a function $f:\reals^n\to\reals$ at $x\in\dom f$ in the direction $d\ne 0$ is

\[f'(x;d)\triangleq\lim_{\eta\searrow 0}\frac1\eta(f(x+\eta d) - f(x)).\]

If the limit is not well-defined, then the directional derivative (of $f$ at $x$ in the direction $d$) does not exist, and $f$ is not directionally differentiable at $x$ in the direction $d$.

If $f$ is differentiable, then Taylor’s theorem implies

\[\begin{aligned} f'(x;d) &= \lim_{\eta\searrow 0}\frac1\eta(f(x+\eta d) - f(x)) \\ &= \lim_{\eta\searrow 0}\frac1\eta(\eta\nabla f(x)^\top d + o(\eta)) \\ &= \lim\nabla f(x)^\top d + o(1). \end{aligned}\]

If $f$ is convex, then the map $\eta\to\frac1\eta(f(x+\eta d) - f(x))$ is non-increasing; i.e. $\frac1\eta_1(f(x+\eta_1d) - f(x))\le\frac1\eta_1(f(x+\eta_2d) - f(x)).$

(check this)

This implies the directional derivative of a convex function is equivalently $f'(x;d)\triangleq\inf_{\eta > 0}\frac1\eta(f(x+\eta d) - f(x)).$

Thm: $f'(x;d) = \sup_{g\in\partial f(x)} g^\top d$.

It is not hard to show that $f'(x;d) \ge g^\top d$ for any $g\in\partial f(x)$. Indeed,

\[\begin{aligned} f'(x;d) &= \lim_{\eta\searrow 0}\frac1\eta(f(x+\eta d) - f(x)) \\ &\ge \lim_{\eta\searrow 0}\frac1\eta g^\top(\eta d) \\ &= g^\top d. \end{aligned}\]

The hard part is showing the sup is attained. This part relies on the the observation that $f'(x;d)$ is actually convex WRT $d$. Indeed,

\[\begin{aligned} &f'(x;\alpha d_1 + (1-\alpha)d_2) \\ &\quad= \lim_{\eta\searrow 0}\frac1\eta(f(x + \alpha\eta d_1 + (1-\alpha)\eta d_2) - f(x)) \\ &\quad= \lim_{\eta\searrow 0}\frac1\eta(f(\alpha(x + \eta d_1) + (1-\alpha)(x + \eta d_2)) - \eta f(x) - (1-\eta)f(x)) \\ &\quad\le \lim_{\eta\searrow 0}\frac1\eta(\alpha f(x + \eta d_1) + (1-\alpha)f(x + \eta d_2) - \eta f(x) - (1-\eta)f(x)) \\ &\quad\le \alpha\lim_{\eta\searrow 0}\frac1\eta(\alpha f(x + \eta d_1) - f(x)) + (1-\alpha)\lim_{\eta\searrow 0}\frac1\eta((1-\alpha)f(x + \eta d_2) - (1-\eta)f(x)) \\ &\quad= \alpha f'(x;d_1) + (1-\alpha)f'(x;d_2). \end{aligned}\]

Since $f'(x;d)$ is convex WRT $d$, it has a subgradient $g$ at $d$:

\[\lambda f'(x;y) = f'(x;\lambda y) \ge f(x;d) + g^\top(\lambda y - d).\]

If we let $\lambda\nearrow 0$, we obtain $f'(x;y) \ge g^\top y$. This implies

\[f(x+y) - f(x) \ge \inf_{\eta > 0}\frac1\eta(f(x + \eta y) - f(x)) = f'(x;y) \ge g^\top y,\]

so $g$ is also a subgradient of $f$ at $x$. On the other hand, if we let $\lambda\searrow 0$, we obtain $f(x;d) \ge g^\top d$. Thus the sup is attained.

Posted on March 14, 2023 from Ann Arbor, MI.