I'm reading through a paper which presents at some point an optimization step to a function of the form:
$$ E = \sum_i \left|\alpha_i - \beta_i \right| $$
where $\alpha_i$ and $\beta_i$ are also functions, it doesn't really matter the specific form. But what is claimed is that the problem is solved using a L-BFGS method. I thought there was a mistake in the formula but they took inspiration from another paper where the same formula is showed, with the difference in the latter paper a Quasi Newton method is mentioned.
Now... as far as I know the function needs to be at least twice differentiable, and the function $E$ isn't.
Is there some form of trick that is usually applied? I know you can approximate the $|\cdot|$ by a differentiable function (such as $f_n(x) = \frac{1}{n} \ln(\cosh(nx))$, but there's no mention to any approximation.
The other "trick" I can think of is just to compute the derivative of the step function as the signum function (regardless of what happens at 0).
What I would do is doing something like what I mentioned, my question is... is this what would have been done in practice?