2

I'm learning about ADMM by reading Boyd's paper Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers.

The paper says that ADMM is an improvement over the Method of Multipliers because the latter requires the minimization of the augmented Lagrangian:

$L_\rho(x,y) = f(x) + y^T (Ax − b)+(\rho/2)\lVert Ax − b\rVert^2_2$

which we cannot minimize in parallel because the added quadratic penalty term makes the function non-separable.

However, the paper says that ADMM requires minimization of the augmented Lagrangian functions:

$x^{k+1} = \underset{x}{\operatorname{argmin}} L_{\rho}(x,z^k,y^k)$

$z^{k+1} = \underset{z}{\operatorname{argmin}} L_{\rho}(x^{k+1},z^k,y^k)$

where the augmented Lagrangian is defined as

$L_{\rho}(x,z,y) = f(x) + g(z) + y^T(Ax + Bz − c)+(\rho/2)\lVert Ax + Bz − c\rVert^2_2$

The paper says that ADMM is an improvement because it has decomposability unlike the Method of Multipliers. However, shouldn't the augmented Lagrangian for ADMM also be non-separable because it also contains a quadratic penalty term $(\rho/2)\lVert Ax + Bz − c\rVert^2_2$ at the end? What allows it to be more decomposable?

  • 1
    By definition, ADMM solves for x and z one after another, not because that gives the same answer as optimizing for (x, z) jointly, but because convergence theory shows that it's good enough. – Bananach Jan 12 '23 at 09:48

0 Answers0