Applying a quasi newton (L-BFGS) method to a non differentiable cost function.

Question

I'm reading through a paper which presents at some point an optimization step to a function of the form:

$$ E = \sum_i \left|\alpha_i - \beta_i \right| $$

where $\alpha_i$ and $\beta_i$ are also functions, it doesn't really matter the specific form. But what is claimed is that the problem is solved using a L-BFGS method. I thought there was a mistake in the formula but they took inspiration from another paper where the same formula is showed, with the difference in the latter paper a Quasi Newton method is mentioned.

Now... as far as I know the function needs to be at least twice differentiable, and the function $E$ isn't.

Is there some form of trick that is usually applied? I know you can approximate the $|\cdot|$ by a differentiable function (such as $f_n(x) = \frac{1}{n} \ln(\cosh(nx))$, but there's no mention to any approximation.

The other "trick" I can think of is just to compute the derivative of the step function as the signum function (regardless of what happens at 0).

What I would do is doing something like what I mentioned, my question is... is this what would have been done in practice?

Alex Shtoff · Answer 1 · 2019-03-06T15:49:46.963

2

The reasons are unknown yet, and nobody possesses a proof, but it turns out that in practice BFGS style algorithms do converge on many non-smooth problems too, as was pointed out by Prof. Adrian Lewis in this excellent talk on slides 3-4.

Update

It seems that some results explaining why quasi-newton methods work in this case are available. See, for example, this paper.

edited Mar 06 '19 at 15:49

answered Mar 06 '19 at 15:44

Alex Shtoff

12

I mentioned BFGS, but I think any sort of optimization algorithm (especially in engineering applications) are applied in that way. There's no paper I've read so far that actually proves "ok we optimize this function, let's actually prove we can use X to optimize it". They just apply the method, my personal theory is that as long as your function belongs to closure of some set of differentiable functions you can optimize it. – user8469759 Mar 06 '19 at 16:24
Interesting answer anyway. – user8469759 Mar 06 '19 at 16:25
@user8469759, that is indeed what many researchers and engineers do, and it frustrates me as well. But researchers in the optimization community are constantly attempting to find provable and reliable algorithms for yet unsolved families of problems, and this practice might change. – Alex Shtoff Mar 06 '19 at 20:20

Applying a quasi newton (L-BFGS) method to a non differentiable cost function.

1 Answers1