3

I have an optimization problem of the form: \begin{alignat}{4} &\text{minimize}\quad &&f(x) \\ &\text{subject to} \quad && \|x\|_2 \leq \varepsilon \end{alignat}

with variable $x \in \mathbb{R}^n$ where $f$ is Lipschitz continuous.

I want to convert this problem to have only box constraints. So I want to convert this to:

\begin{alignat}{4} &\text{minimize}\quad &&f(g(z)) \\ &\text{subject to} \quad && z_i \in [a_i, b_i] \quad \text{for} \quad 1 \leq i \leq n \end{alignat}

Ideally $g$ should also be Lipschitz continuous. Any idea what function I can use to do this change of variable to parametrize the $L_2$ norm-ball? Is it easy to do so?

The reason I want to do this change of variable is to utilize the method in this paper that uses adaptive nested optimization: https://link.springer.com/article/10.1007/s10898-015-0355-7 .

Adrian Keister
  • 10,099
  • 13
  • 30
  • 43
fornit
  • 133
  • 6

1 Answers1

2

In general, I don't think it's very easy, but there are a few things you can do. Firstly $\|x\|_2 \leq \epsilon$ is equivalent to $\|x\|_2^2 \leq \epsilon^2$, that is

$$ \sum_{i=1}^{n}x_i^2 \leq \epsilon^2.\tag{1}\label{1} $$

You can then introduce $n$ auxiliary scalar decision variables, $t_1,\ldots, t_n$ and replace \eqref{1} by

\begin{align} |x_i| {}\leq{}& t_i, \\ \sum_{i=1}^{n}t_i {}\leq{}& \epsilon^2. \end{align}

In case you cannot handle $\sum_{i=1}^{n}t_i {}\leq{} \epsilon^2$ directly, you can associate with it the penalty function $\psi(t) = \max\{\sum_{i=1}^{n}t_i -\epsilon^2, 0\}$ and define the modified cost function

$$ \tilde{f}(x,t; \lambda) = f(x) +\lambda \psi(t). $$

This is a Lipschitz continuous and you can apply a penalty method on top of your algorithm (or an augmented Lagrangian method - you have to see which one works better). Simply take $\lambda \to \infty$ and stop once $\psi(t)$ becomes sufficiently small.

Let me just clarify that for each $\lambda$ you will have to solve the following problem

\begin{align} &\operatorname*{Minimize}_{x,t} \tilde{f}(x,t; \lambda) \\ &\text{subject to } |x_i| \leq t_i, \text{ for all } i =1,\ldots, n \end{align}

Update 1. Under the stronger assumption that $f$ is continuously differentiable with Lipschitz gradient, it turns out the the original problem is equivalent to the unconstrained minimization problem

$$ \operatorname*{Minimize}_{x} f(x) - \frac{\gamma}{2}\|\nabla f(x)\|^2 + \frac{\gamma}{2}\operatorname{dist}^2_D(x - \gamma \nabla f(x))^2,\tag{2}\label{2} $$

where $D$ is the set of box constraints (I'm using the same notation as the paper you cited), and $\operatorname{dist}^2_D$ is the squared distance from $D$, which is easy to compute. The cost function in \eqref{2} is known as the forward-backward envelope of the original problem (see for example this paper).

If $f$ is just Lipschitz the above does not apply. Generally speaking, this is too weak an assumption in optimization (with the exception of global optimization where people resort to branch-and-bound-type methods).

Update 2. You can use a homeomorphism between $B_2 := \{x\in\mathbb{R}^n: \|x\|_2 \leq 1\}$ and $B_\infty := \{x\in\mathbb{R}^n: \|x\|_\infty \leq 1\}$, i.e., a function $\phi: B_2\to B_{\infty}$ such as

$$ \phi: B_{\infty} \ni x \mapsto \phi(x) = \frac{\|x\|_\infty}{\|x\|_2}x \in B_2 $$

for $x\neq 0$ and $\phi(0) = 0$

The inverse is

$$ \phi^{-1}: B_2\ni z \mapsto \phi^{-1}(z) = \frac{\|z\|_2}{\|z\|_\infty}z \in B_{\infty} $$

for $z\neq 0$ and $\phi^{-1}(0) = 0$.

The constraints in your problem can be written as $\frac{x}{\epsilon} \in B_2$. Define $u = \phi^{-1}(x/\epsilon)$. Then $\frac{x}{\epsilon} = \phi(u) \Leftrightarrow x = \epsilon \phi(u)$. The constraints become

\begin{align} \frac{x}{\epsilon} \in B_2 \Leftrightarrow{}&{}\frac{x}{\epsilon} \in \phi (B_\infty) \\ \Leftrightarrow{}&{}\phi(u) \in \phi (B_\infty) \\ \Leftrightarrow{}&{}u \in B_{\infty} \end{align}

and the optimization problem becomes

\begin{align} &\operatorname*{Minimize}_u f(\epsilon \phi(u)) \\ &\text{subject to } -1 \leq u \leq 1 \end{align}

Once you find an optimal solution $u^\star$ (if one exists), you can retrieve $x^\star = \epsilon \phi(u^\star)$.

  • To follow up: would there be any way to minimize only for $x$. Because the method in the paper does not optimize the box constraints (i.e. $t$) ? – fornit May 09 '19 at 15:56
  • @fornit Let me try something... – Pantelis Sopasakis May 09 '19 at 16:05
  • @fornit I added two methods. The advantage of the first method is that you can get good performance. I'm skeptical about the second method. I'm interested in how this may perform in practice. It may be worth exploring other such mappings from $B_2$ to $B_\infty$. – Pantelis Sopasakis May 09 '19 at 17:02
  • What is the performance disdvantage of the second method? – fornit May 10 '19 at 10:45
  • @fornit You mean the method in "Update 1"? It involves very simple steps at every iteration. In SQP you would need to solve a QP and in IP methods you would need to solve a linear system at every iteration. It also uses quasi-Newtonian directions, which make it converge very fast and you have convergence guarantees for nonconvex problems (to a critical point). Using a global solver on top of this would be interesting. If you like we can talk offline. – Pantelis Sopasakis May 10 '19 at 10:52
  • Sorry, I meant Update 2. – fornit May 10 '19 at 11:00
  • 1
    @fornit Oh, sorry. The new cost function will be $\tilde{f}(u) := f(\epsilon \phi(u))$, which might be ill conditioned (this depends on $f$ and $\epsilon$ as well). But, you would need to test this, I guess. – Pantelis Sopasakis May 10 '19 at 11:08