Clarification on computation complex optimization

Question

This might be a question with a very simple solution but I really don't see where this comes from. In the book on Convex optimization by Boyd and Vandenberghe in the section of examples on duality, the author compute a norm approximation problem. Particularly, they write the norm minimization problem $$ \text{minimize } \|Ax - b\|$$ as \begin{equation} \begin{aligned} & \text{minimize} & & \|y\|\\ & \text{subject to} & & Ax - b = y \end{aligned} \end{equation}

From here, I know the dual would have the form $L(x,y,\nu) = \|y\| + \nu^T(Ax - b -y) = \|y\| + \nu^T(Ax - b) - \nu^T y$. Now, the authors claim that the dual problem ends up being

\begin{equation} \begin{aligned} & \text{minimize} & & b^T\nu\\ & \text{subject to} & & \|\nu\|_* \leq 1 \\ & & A^T\nu = 0 \end{aligned} \end{equation}

I thought the objective function in the dual would be $-\nu^T b$ and the second constraint would be $\nu^TA$. Can anyone give me a mathematical reason of why they take the transpose of these expressions? Also, why is the sign for the objective function positive and not negative. Thanks for your help!

Brian Borchers · Accepted Answer · 2017-12-07T04:00:02.840

With the constraints $Ax-b=y$, you have a choice to write the Lagrangian with (subtracting the right side from the left)

$\nu^{T} ((Ax - b) - y)$

or (subtracting the left side from the right side of the equation)

$\nu^{T} (y- (b-Ax))$

where switching from one version to the other simply changes the sign of $\nu$. If you use the second version, you'll end up with

$\min b^{T}\nu$

in your objective function. The dual constraints are all sign-invariant, so this doesn't change anything else in the statement of the dual problem.

Keep in mind that this pattern can show up any time you have linear equality constraints in your primal problem. Different authors often make different choices, so you should be comfortable switching back and forth.

The constraints $A^{T}\nu=0$ and $\nu^{T}A=0$ are exactly the same linear constraints because $(A^{T}\nu)^{T}=\nu^{T}A$. In the first case, they're written in the form of a matrix times a column vector equal to a column vector of 0's. In the second case, they're written as a row vector times a matrix equal to row vector of 0's. It's more conventional to write this at $A^{T}\nu=0$.

I've gotten similar questions from my students many times over the years- you're not alone in struggling with these kinds of issues. However, they really do boil down to simple linear algebra and if you work at it, you should become comfortable with these kinds of minor algebraic differences.

Thanks! This is really helpful, I have taken a course in linear algebra but don't really remember covering these kind of subtleties, is there any particular subset of linear algebra that might be helpful to get more comfortable with this type of operations? Or maybe some chapters of books that you recommend for this? Or does this just improve as you do more and more exercises? Thanks! — user110320, Dec 07 '17 at 04:02
I'd encourage you to keep working with the convex optimization material. You seem to understand linear algebra well enough- it's more a matter of becoming quick and fluent in these kinds of manipulations that occur all the time in convex optimization. — Brian Borchers, Dec 07 '17 at 04:05

Clarification on computation complex optimization

1 Answers1