0

I am reading the book Introduction to Nonlinear Optimization by Amir Beck. This problem is the problem 9.2 in his book.

Problem: Consider the Huber function $$ H_{\mu} = \begin{cases} \frac{\|x\|^2}{2\mu}, & \|x\| \leq \mu, \\ \|x\| - \frac{\mu}{2}, & else. \end{cases} $$ where $\mu > 0$ is a given parameter. Show that $H_{\mu} \in C^{1,1}_{\frac{1}{\mu}}$.

($f \in C^{1,1}_{\frac{1}{\mu}}$ iff for $\forall x,y$, $\|\nabla f(x) - \nabla f(y)\| \leq \frac{1}{\mu}\|x-y\|.$)

Maybe it is an easy problem, but I somewhat got stuck...

MY ATTEMPT:

  1. If $0 < \mu \leq \frac{1}{2}$, $$ \nabla^2 H_{\mu}(x) = \begin{cases} \frac{1}{\mu}I, & \|x\| \leq \mu, \\ 2I, & else. \end{cases} $$ then, $\|\nabla^2 H_{\mu}\| \leq \frac{1}{\mu}$, this is equivalent to $H_{\mu} \in C^{1,1}_{\frac{1}{\mu}}$.

  2. If $\mu > \frac{1}{2}$, I do not know how to deal with this case.

Could anyone give me some hints? Or you may feel free to point out my mistakes in the proof. Any help will be appreciated.

Corrected Version: I found that I calculated the gradient wrongly, it should be $$ \nabla H_{\mu}(x) \begin{cases} \frac{x}{\mu}, & \|x\| \leq \mu,\\ 1, & else. \end{cases} $$ Thus, $$ \nabla^2 H_{\mu}(x) = \begin{cases} \frac{1}{\mu}I, & \|x\| \leq \mu,\\ 0, & else. \end{cases} $$ Hence, $\|\nabla^2 H_{\mu}\| \leq \frac{1}{\mu} \Rightarrow H_{\mu} \in C^{1,1}_{1/\mu}.$ $\square$

Siamese
  • 197
  • I think there might be a typo(?) in the definition of $H_\mu$; as defined it's not even cotninuous accros $| x|=\mu$. – Jose27 Apr 08 '22 at 13:40
  • @Jose27 Yeah, I have corrected that typo. – Siamese Apr 08 '22 at 14:06
  • Are you in $\mathbb R^n$ and is $|\cdot |$ the usual Euclidean norm? If so the gradient of $| x|$ isn’t $1$ it is $\frac{x}{|x|}$. – JackT Apr 10 '22 at 01:26
  • @JackT Yes, this book only deals with $\mathbb{R}^n$ case and if not indicated, the norm denotes the usual 2-norm. – Siamese Apr 10 '22 at 01:58

2 Answers2

1

Your solution is incorrect. You have correctly noted that $$\partial_{ij} H_\mu (x) = \frac 1 {\mu} \delta_{ij} \qquad \text{for all } x\in B_\mu. $$ where $B_r := \{ x\in \mathbb R^n \text{ s.t. } \| x\| <r\}$ and $\delta_{ij}$ is the Kronecker delta. However, the statement "$\nabla^2H_\mu=0$ for $x\in \mathbb R^n \setminus \overline{B_\mu}$" is not true. Indeed, \begin{align*} \partial_i \| x\| &= \partial_i \bigg ( \sum_{k=1}^n x_k^2 \bigg ) ^{1/2} \\ &= \frac 1 2 \bigg ( \sum_{k=1}^n x_k^2 \bigg ) ^{-1/2} \cdot 2x_i \\ &= \frac{x_i}{\| x \| } \end{align*}and, from the product rule, \begin{align*} \partial_{ij} \| x\| &= \frac{\delta_{ij}}{\| x\| } -\frac{x_i}{\| x\|^2}\partial_j \| x\|\\ &=\frac{\delta_{ij}}{\| x\| } -\frac{x_ix_j}{\| x\|^3}. \end{align*} Hence, for $x\in \mathbb R^n \setminus \overline{B_\mu}$, $$\partial_{ij}H_\mu= \frac{\delta_{ij}}{\| x\| } -\frac{x_ix_j}{\| x\|^3} $$ which you can alternatively write as $$\nabla^2H_\mu = \frac1{\| x\| }I -\frac1{\| x\|^3} ( x\otimes x )$$ where $\otimes$ denotes the outer product. (Be careful with the notation $\nabla^2$ for Hessian - often this symbol denotes the Laplacian which is the trace of the Hessian).

Next, we want to show that $\| \nabla^2 H_\mu \| \leqslant \frac 1 \mu$ where $\| \nabla^2 H_\mu \|$ denotes the operator norm of $\nabla^2 H_\mu $. For $x\in B_\mu$, $$\| \nabla H_\mu x \| = \frac 1 \mu \| x\|. $$ Moreover, for $x\in \mathbb R^n \setminus \overline{B_\mu}$, \begin{align*} \|\nabla^2 H_\mu x\|^2 &= \sum_{k=1}^n (\nabla^2H_\mu x)_k^2 \\ &= \sum_{k=1}^n \bigg ( \sum_{k=1}^n (\nabla^2H_\mu)_{jk} x_k \bigg )^2 \\ &=\sum_{k=1}^n \bigg ( \sum_{k=1}^n \frac{\delta_{jk}x_k}{\| x\| } -\sum_{k=1}^n\frac{x_jx_k^2}{\| x\|^3} \bigg )^2 \\ &=\sum_{k=1}^n \bigg ( \frac{x_j}{\| x\| } -\frac{x_j}{\| x\|} \bigg )^2\\ &=0. \end{align*} Thus, \begin{align*}\|\nabla^2 H_\mu x\| \leqslant &\begin{cases} 1/\mu , &\text{in }B_\mu \\ 0, & \text{in } \mathbb R^n \setminus \overline{B_\mu} \end{cases} \\&\leqslant \frac 1 \mu \end{align*} as required.


Edit: Details about $\| \nabla^2 H_\mu \|$: I had a look at the book you're following and since $ \nabla^2 H_\mu $ is matrix, $\| \nabla^2 H_\mu \|$ denotes the standard operator norm which is defined by $$ \| \nabla^2 H_\mu \| := \sup \bigg \{ \frac{\| \nabla^2 H_\mu x\|}{\| x\|} \text{ s.t. } x\in \mathbb R^n \setminus \{0\}\bigg \}. $$ (I'm using the notation from the book which I'm not a huge fan of since now when I write $\| \nabla^2 H_\mu x\|$ this is refering to the Euclidean norm in $\mathbb R^n$). If a matrix $A$ is diagonalisable then $\| A \|$ is equal to the maximum of it's eigenvalues, see this post for example. Since $\nabla^2 H_\mu$ is symmetric, it's diagonalisable so the definition you cite is equivalent. However, as you mention, your definition is not so easy to work with when you actually need to compute things.

JackT
  • 6,854
  • Thanks, I get you point. Could you explain more on the calculation of $|\nabla^2 H_{\mu}x|^2$? (btw: I guess there are some typos here, maybe you missed some $\nabla^2$...). From my point of view, to calculate $|\nabla^2 H_{\mu}x|$, one need to find the maximum of its eigenvalues, which might be quite difficult. My idea is to use $|x|1 \geq |x|_2$, it is easy to see that $|\nabla^2 H{\mu}x|_1 \leq \frac{1}{\mu},$ then we are done. – Siamese Apr 10 '22 at 08:34
0

Notice that the Huber function is the Moreau envelope (with parameter $\mu$) of the $\ell^1$ norm. The Moreau envelope (with parameter $\mu$) is always $\mu^{-1}$-Lipschitz-smooth (gradient is Lipschitz-continuous).

More formally, let $f(x) = \|x\|_1$ and define the Moreau envelope to be

$$f^\mu(x) := \min\limits_{u\in\mathbb{R}^n} \{f(u) + \frac{1}{2\mu}\|x-u\|_2^2\}$$

The gradient of the Moreau envelope is given by

$$\nabla f^{\mu}(x) = \frac{x - \mathrm{prox}_{\mu f}(x)}{\mu}$$

Now recall that the prox operator of a closed convex proper function is $1$-Lipschitz continuous and use the Moreau decomposition to find

$$\nabla f^{\mu}(x) = \frac{1}{\mu}\mathrm{prox}_{(\mu f)^*}(x)$$

which is $\frac{1}{\mu}$-Lipschitz continuous.