(Proximal) subgradient inclusion property proof

Question

I'm having a bit of trouble proving what seems to be two fairly straightforward statements for a nonlinear optimisation class I'm taking. We're studying properties of the proximal subgradient, $\partial_p(f) = \{ v\ |\ \exists \rho, \delta: f(y) \geq f(x) + \langle v, y-x \rangle - \frac{\rho}{2}||y-x||^2\ \forall y \in B_\delta(x)\}$

The two properties I'm attempting to prove are as follows:

If $f \in C^1(\mathbb{R}^n)$, then $\partial_p(f) \subseteq \{\nabla f\}$
If $f \in C^2(\mathbb{R}^n)$, then $\partial_p(f) = \{\nabla f\}$

I've managed to make a good start, I think, for the $C^1$ case -- replace $f(y)$ with $f(x) + \langle \nabla f(x), y-x\rangle + o(||y-x||^2)$, divide through by $||y-x||$ and take $lim_{y \rightarrow x}$. This takes me to $\langle \nabla f(x) - v, \hat{h}\rangle \geq 0$ (where $\hat{h}$ is some unit-length vector), which gives me that $\nabla f(x) = v$ -- however, this gives me equality, not inclusion!

Would someone be able to tell me what I'm doing wrong, or what piece I'm missing? At this point I'm mildly confused, which is preventing me from making a start on the $C^2$ case!

I think what you did for the $C^1$ case can only be done if you know $f$ is in $C^2$. The Taylor remainder requires that second derivative. — Evan, Aug 25 '13 at 05:04
But then don't you end up with a $\langle \nabla^2 f, 0\rangle$ term? — Ben Stott, Aug 25 '13 at 05:22
For $C^2$ functions you can only get up to the linear term $f(x+h) = f(x) + \nabla f^T h + O(|h|_2^2)$. The second derivative term is used for the remainder $O(|h|_2)$ (should be big O). For $C^1$ you'll probably need some sort of mean value theorem. http://en.wikipedia.org/wiki/Taylor's_theorem#Taylor.27s_theorem_for_multivariate_functions — Evan, Aug 25 '13 at 05:37
Hmm, so the hint for our question is that for the $C^1$ case we should use a first-order Taylor expansion, and the $C^2$ case will need a second-order expansion (and then show $\nabla f(x) \in \partial_p f(x)$
According to the question it's the right way, but I'm happy to be shown that it's incorrect (though I'd be a little surprised) — Ben Stott, Aug 25 '13 at 05:41
Oops, deleted comment: So it is saying that the subgradient is either empty or has the gradient in it? Also, chasing the wikipedia article matches what I am saying. I think first-order is constant + remainder (which has a first order dependence). — Evan, Aug 25 '13 at 05:56
I believe the claim is that the subgradient is either empty, or contains only elements that are part of the set of all gradients.
I'll have a play with just the constant + remainder term, thanks. — Ben Stott, Aug 25 '13 at 10:10
(Also, if you wish to turn your comment r.e. first order Taylor expansions into an answer, I'll be happy to accept it! This seemed to be what I was missing.) — Ben Stott, Aug 26 '13 at 02:08
(edited previous) Actually, Okay, I can do that so it's not unanswered at least. — Evan, Aug 26 '13 at 02:10

Evan · Accepted Answer · 2013-08-27T02:29:20.150

(from comment) For $C^1$, you can only get $f(y) = f(x) + O(\|x-y\|_2)$, whereas for $C^2$ you have what you used, which is $f(y) = f(x) + \nabla f^T (y-x) + O(\|x-y\|_2^2)$. Reference at wikipedia entry for Taylor's Theorem

So it leaves the $C^1$ case just for you (Did you get it?)

I think you may want to use the remainder (in this case it's a type of Fundamental Theorem):

$f(y) - f(x) = \int_0^1 \nabla f(x+t(y-x))^T (y-x) dt$

Suppose there is a $v$ in the subgradient. We have

$\int \left\langle \nabla f(x+t(y-x)), y-x \right\rangle dt = f(y)-f(x) \geq \langle v,y-x \rangle - \rho/2 \|y-x\|^2$

From here you can do a few things to show that $v = \nabla f$. By the way, what this shows is that assuming something is even in the subgradient, it has to be $\nabla f(x)$. This actually shows inclusion. (PS I had slightly more work in a previous edit if you wanted to see how, but I feel you might not need it.)

So it seems that in the $C^2$, you have one more thing to prove: You have to show that $\nabla f(x)$ is in the subgradient. This means finding a $\rho$ and a $\delta$ such that the conditions of the subgradient holds, but it seems that the work is already there, you just need to be careful when you write the details of proof.

For instance, in your original post, you seem to assume the subgradient condition for some $v$, and then do a replacement and manipulation, and conclude that $v = \nabla f$, so this actually was showing inclusion.

I believe so: in $C^1$: $f(y) = f(x) + O(||y - x||)) \implies f(x) + O(||y - x||) \geq f(x) + \langle v, y - x\rangle - \frac{\rho}{2}||y - x||^2$, then dividing through by $||y - x||$ and letting $y \rightarrow x$ we end up with $\langle v, h \rangle \geq 0\ \forall\ h\ \implies v = 0$, which I believe is enough to then claim inclusion (since it can never be larger) — Ben Stott, Aug 26 '13 at 06:01
@BenStott When you divide $|y-x|$, it doesn't cancel the $O(|y-x|)$ part. I got $O(1) \geq \langle v, h \rangle$ for all $h$. Did you do something on top? Or is that enough? — Evan, Aug 27 '13 at 02:00
(sorry for the late response) You're right; that should be an $O(1)$ term that I don't think I accounted for. It seems that throughout the notes for this course, each Taylor expansion seems to use an $o(\cdot)$ term instead of an $O(\cdot)$ term. I guess I'll ask about this when I see my lecturer next! — Ben Stott, Sep 01 '13 at 06:40

(Proximal) subgradient inclusion property proof

1 Answers1