8

The definition of strongly convex from Wikipedia:

It is not necessary for a function to be differentiable in order to be strongly convex. A third definition for a strongly convex function, with parameter $m$, is that, for all $x$, $y$ in the domain and $t\in [0,1]$, $$f(tx+(1-t)y) \le t f(x)+(1-t)f(y) - \frac{1}{2} m t(1-t) \|x-y\|_2^2.$$

Prove that the $2$-norm squared $f(w) = m\|w\|^2 $ is $m$ strongly convex

I have so far tried to use the triangle inequality but I cannot derive that last negative term.

Leo
  • 1,539
  • Is this homework? If so, what have you tried so far, and where are you stuck? – Lopsy Jan 29 '12 at 20:14
  • Sorry, my bad. I edited the question. This isn't homework, I met this claim in lecture notes along with a promise that the proof if near trivial but I can't get through with it. – Leo Jan 29 '12 at 20:34

3 Answers3

10

The key is to use the fact that the $2$-norm comes from an inner product. We have \begin{align*} f(tx+(1-t)y)-t(f(x)-(1-t)f(y)&=mt^2||x||^2+2mt(1-t)\langle x,y\rangle+m(1-t)^2||y||^2\\ & - mt||x||^2-m(1-t)||y||^2\\ &=mt(t-1)||x||^2+m(1-t)(1-t-1)||y||\\ &+2mt(1-t)\langle x,y\rangle\\ &=-mt(1-t)||x||^2+2mt(1-t)\langle x,y\rangle -mt(1-t)||y||^2\\ &=-mt(1-t)(||x||^2-2\langle x,y\rangle+||y||^2)\\ &=-mt(1-t)||x-y||^2\\ &\leq -\frac 12mt(1-t)||x-y||^2 \end{align*} since $mt(1-t)||x-y||^2\geq 0$.

Davide Giraudo
  • 172,925
  • 1
    Thanks, Can you see any sense in this beyond algebric "magic"? missing squares in second line btw. – Leo Jan 29 '12 at 23:55
  • You can draw a picture for the dimension $1$. – Davide Giraudo Jan 30 '12 at 10:10
  • 3
    The picture of x^2 doesn't provide much insight. Simple convexity 1d pictures explain very well but it's hard if not impossible to guess strong convexity from a picture. I can't imagine for example how picturing this could guide the derivation of the proof. – Leo Jan 30 '12 at 11:55
  • See my new answer below for more intuition – Dan Feldman Feb 10 '23 at 06:35
4

If the function $f$ is twice continuously differentiable, then it is strongly convex with parameter $\mu>0$ if and only if $\nabla^2 f(x) \succeq \mu I$ for all $x$ in the domain, where $I$ is the identity and $\nabla^2f$ is the Hessian matrix, and the inequality $\succeq$ means that $\nabla^2 f(x) - \mu I$ is positive semi-definite.

The definition above is equivalent to the definition given by the OP if $f$ is twice continuously differentiable (see here for more details).

The function $f(w)=m\|w\|^2$ is twice continuously differentiable. We have that $$ \nabla f(w)=2mw \quad\text{and}\quad\nabla^2f(w)=2mI. $$ Hence, the function $f$ is strongly convex with parameter $2m$.

Cm7F7Bb
  • 17,364
1

The technical steps in previous answers seem easy to check but hard to understand, as users stated in the comments. The magical identity is: $$p||x||^2+(1-p)||y||^2-||px+(1-p)y||^2=p(1-p)||x-y||^2,$$ for every $p\in [0,1]$ and $x,y\in R^d$.

For the case $d=1$, let $X$ be a Bernoullii distribution variable such that $Pr(X=1)=p$. Hence, for $Z=(x-y)X+y$, we have $Z=x$ with probabilty $p$, and $Z=y$ with probability $1-p$. Therefore, $$px^2+(1-p)y^2-(px+(1-p)y)^2 =E[Z^2]-E^2[Z]=VAR(Z)=VAR((x-y)X)=(x-y)^2 VAR(X)=(x-y)^2 p(1-p). $$

For $d\geq 2$: $$p||x||^2+(1-p)||y||^2-||px+(1-p)y||^2 =\sum_{i=1}^d \Big(px_i^2+(1-p)y_i^2-(px_i+(1-p)y_i)^2\Big)=\sum_{i=1}^d \Big(p(1-p)(x_i-y_i)^2\Big)=p(1-p)||x-y||^2,$$ where the second equality is due to the case $d=1$.

Intuition: Usually, when there is $p\in[0,1]$ and $1-p$ there must be a relation to probability theory. The right hand side is the variance $VAR(X)$ of a random variable $X$ with Bernoulli distrubution of probability $p$, after assuming $x-y=1$ and $x\geq y$ wlog. Otherwise, divide both $x$ and $y$ by $x-y$, and switch between $x$ and $y$ if necessary. For $x=1$ and $y=0$, let $Pr(X=1)=Pr(X=x-y)=Pr(X=x):=p$ and $Pr(X=0)=Pr(X=y):=1-p$, to obtain $$px^2+(1-p)y^2-(px+(1-p)y)^2 =E[X^2]-E^2[X]=VAR(X)= =p(1-p).$$ More generally, for $x\in R$ and $y=x-1$, recall that the variance is invariant to translation. So, replace $X$ by $X+y$ that equals to $(x-y)+y=x$ with probability $p$, and $X+y=0+y=y$ otherwise, to obtain $$px^2+(1-p)y^2-(px+(1-p)y)^2 =E[(X+y)^2]-E^2[X+y]=VAR(X+y)=VAR(X) =p(1-p).$$