Gradient mapping minimization problem

Question

I am trying to solve this particular problem which can be found in this paper: $$\underset{y\in\Delta_n}{\text{argmin}}\{\langle\nabla f(x),y-x\rangle+\dfrac{1}{2}L\|y-x\|_1^2\}$$ I tried formulating the above problem as a projection problem: $$\underset{y\in\Delta_n}{\text{argmin}}\{\|y-x+\dfrac{1}{L}\nabla f(x)\|_1\},$$ but I don't know of any methods to compute projection for L1 norm. The method provided in the aforementioned paper (Section 5) started with $$\psi^*=\underset{x\in\Delta_n}{\text{min}}\{\langle\bar{g},x-\bar{x}\rangle+\dfrac{1}{2}L\|x-\bar{x}\|_1^2\}$$ and ended with $$-\psi^*=\underset{\tau\ge0}{\text{min}}\{\sum_{i=1}^{n}\bar{x}^{(i)}(\bar{g}^{(i)}-2\tau)_++\dfrac{\tau^2}{2L}\},$$ where $\left(a\right)_+=max\{a,0\}$.

I am not sure of how to construct the optimal solution $x$ from there.

Do you need to solve it using a $\ell_1$ projection? Otherwise I would restate the original problem as a quadratic problem with convex objective, and apply any QP methods you like. — user251257, Jul 28 '15 at 16:10
@user251257 Not necessarily have to solve using $\ell_1$ projection, just that I've used the algorithm for euclidean projection for the case when the norm is a $\ell_2$ norm. I would like to know more about the QP methods that you have in mind. — Sapphire, Jul 28 '15 at 16:57
using the trick $|x| \le u+v$ hold for any $x=u-v$, $u,v\ge 0$, you can turn the $\ell_1$ norm into a linear function of additional auxiliary variables thus, the objective becomes convex quadratic. the feasible set is some polyhedron. So it is a standard QP. if you don't need to implement it yourself, you should try some standard software packages, which usually implement such kind of transformation. — user251257, Jul 28 '15 at 17:10
@user251257 How do I go about using the method given in the paper to solve for the optimal solution? — Sapphire, Jul 29 '15 at 13:56
@user251257 In section 5 of the paper, they modified the original problem (refer to my question), but left the reconstruction of the optimal solution as an exercise. That's what I'm unsure about. — Sapphire, Jul 29 '15 at 14:14
@user251257 If possible, could you expand the quadratic formulation into an answer? I can't seem to get it into the quadratic formulation. — Sapphire, Jul 29 '15 at 17:13

score 1 · Answer 1 · answered Jul 29 '15 at 17:45

I give a convex quadratic program formulation as suggested in comments. Notice that you need an iterative approximation scheme to solve that QP like projected gradient or interior point method. The referenced paper indicates, it is possible solve the original problem with less effort.

Assume the setting is $\mathbb R^n$. Then, we can restate the problem as constrained quadratic program with convex objective

Assume that $x\in\mathbb R^n$ is fixed and let $g=\nabla f(x)\in\mathbb R^n$. Let $e\in\mathbb R^n$ denote the vector with only ones. Then, we have \begin{align} &\min \{ g^T(y-x) + \tfrac{1}{2} L \| y - x \|_1^2 \mid y\in\Delta_n \}\\ &= \min \{ g^Ts + \tfrac{1}{2} L \| s \|_1^2 \mid x+s\in \Delta_n \} \\ &= \min \{ g^T(u-v) + \tfrac{1}{2} L (\begin{smallmatrix}u \\ v\end{smallmatrix})^T Q (\begin{smallmatrix}u \\ v\end{smallmatrix}) \mid x+u-v\in \Delta_n,\, u,v\ge 0 \}, \\ \end{align} where $$ Q = \begin{pmatrix}e \\ e\end{pmatrix} \begin{pmatrix}e \\ e\end{pmatrix}^T. $$

I tried using quadratic programming on my problem set and it takes too much time, so I guess I have to use the method discussed in the paper. BTW, do you have any convenient algorithm to solve $\tau^*=\underset{\tau\ge0}{\text{argmin}}{\sum_{i=1}^{n}\bar{x}^{(i)}(\bar{g}^{(i)}-2\tau)++\dfrac{\tau^2}{2L}},$ where $\left(a\right)+=max{a,0}$? — Sapphire, Jul 30 '15 at 17:28
@Sapphire didn't the paper say, you only need to sort $\bar g$ to obtain the minimum? — user251257, Jul 30 '15 at 17:59

dohmatob · Accepted Answer · 2015-08-14T18:54:31.573

Remark: Let's first observe that the assumption $\sum_{1 \le i \le n}\bar{g}^{(i)}$ made by the author (Nesterov) is really no loss of generality; otherwise simply replace each $\bar{g}^{(i)}$ with $\bar{g}^{(i)} - \frac{1}{n}\sum_{1 \le k \le n}\bar{g}^{(k)}$ in the computations. BTW, though the author's, assumption is sufficient to obtain the results, it's much 'tighter' to assume $\underset{1 \le k \le 1}{\text{min }}\bar{g}^{(k)} = 0$, since $\lambda \le \bar{g}^{(i)} + \tau\text{ } \forall i$ iff $\lambda \le \underset{1 \le k \le 1}{\text{min }}\bar{g}^{(k)} + \tau$, and the author would want to work with the constraint $\lambda \le \tau$.

Now, sort the components of $\bar{g}$ so that $\bar{g}^n \le ... \le \bar{g}^2 \le \bar{g}^1$. In the optimization problem for $\tau$, the objective function $h(\tau) := \sum_{1 \le i \le n}\bar{x}^i(\bar{g}^i - 2\tau)_+ +\frac{\tau^2}{2L}$ is piecewise quadratic with kinks (singularities) at the points $\tau_k = \frac{1}{2}\bar{g}^k$, $k=1, 2, ..., n$, which partition the open set $\mathbb{R}\setminus\{\tau_k | k=1, 2, ..., n\}$ into $n + 1$ open intervals $I_0 := (\tau_1, +\infty)$; $I_k := (\tau_{k + 1}, \tau_{k})$ if $1 \le k < n$; $I_n := (-\infty, \tau_n)$. It is obvious that $\forall k=0,1, 2, ..., n$ and $\forall \tau \in I_k$, it holds that$h(\tau) = \sum_{i=1}^k\bar{x}^i(\bar{g}^i - 2 \tau) + \frac{\tau^2}{2L}$, and so the nonsingular critical points of $h$ are $\zeta_k := 2L\sum_{i=1}^k\bar{x}^i$ (of course, provided $\zeta_k \in I_k$). To compute $\tau^* := \underset{\tau \ge 0}{\text{argmin }}h(\tau)$, it then suffices to check the values value of $h$ on the finite set of points \begin{eqnarray}P := \{\tau_k | 1 \le k \le n, \tau_k \ge 0\} \cup \{\zeta_k | 1 \le k \le n, \zeta_k \in I_k, \zeta_k \ge 0\},\end{eqnarray} to get the global minimum point $\tau^*$. Note that $1 \le card(P) \le 2n + 1$. Now, as explained in the paper, we use this $\tau^*$ and (the original unsorted version of) $\bar{g}$ to calculate $s^*$ via the point-wise max operation $s^* = \text{max}(\tau^*, \bar{g} - \tau^*) \in \mathbb{R}^n_+$.

Finally, we minimize a linear function on a simplex to obtain $x^*$, as explained in the paper.

A note on the proof of Lemma 6. It's noteworthy that one can issue a one-line proof for Lemma 6 by simply observing that the RHS is precisely the value of the convex conjugate of the function $E^* \rightarrow \mathbb{R}, s \mapsto \frac{1}{2}L(\|s-g\|^*)^2$ at the point $h \in E$. Using elementary properties of conjugation, this value equals $L \frac{1}{2}\|\frac{1}{L}h\|^{\frac{2}{2-1}} + \langle g, h\rangle = \langle g, h\rangle + \frac{1}{2L}\|h\|^2$, which is the LHS.

I think you're minimizing the $\ell_2$ norm squared. The $1$ subscript is only there to refer to the space $E_1$ in which you are working. — dohmatob, Jul 30 '15 at 12:41
This is really some confusing notation (and there many more) from the author of the paper. — dohmatob, Jul 30 '15 at 12:47
I think that the norm is referring the the $\ell_1$ norm. Section 5 of the paper addresses this problem. — Sapphire, Jul 30 '15 at 12:55
BTW, based on what I understand from Section 5, the way to reconstruct the optimal solution $x^$ is to solve for the optimal $\tau$ first, and then use it to solve $s_^{(i)}=\max{\lambda,\bar{g}^{(i)}-\tau}$, and finally solve for $x^*$ using linear programming methods. Am I right? — Sapphire, Jul 30 '15 at 12:59
Oops! Yeah, you're right it's an $\ell_1$ right there. I didn't notice it (I only scanned the section diagonally). Ignore my answer above (modified), it doesn apply to your problem directly. about the way to go about solving for $x$, yes that's the wa to go: solve for $\tau$ (via a kind of binary search), then sompute $s^*$, then $x$. — dohmatob, Jul 30 '15 at 14:07

Gradient mapping minimization problem

2 Answers2