I am trying to show that the alternating minimization algorithm will converge to the global minimum for my application. Since I am not a mathematician, I would appreciate if someone can either improve my arguments or find flaws in those.
My AM algorithm generates a sequence of variables $\hat{δ^n}$, $\hat{x^n}$ at successive iteration steps as follows:
$$\hat{δ^n} = argmin_{δ} ̅(‖A_{MV}(\bar{δ})\hat{x}^{{(n-1)}} - \bar{y}‖^2 +μ_δ‖D\bar{δ}‖^2) ......................[A]$$
$$\hat{x}^{n} = argmin_{x} ̅(‖A_{MV}(\hat{δ^n})\bar{x} - \bar{y}‖^2 +μ_S‖D\bar{x}‖^2) ......................[B]$$
The subscript "MV" stands for multi voxel. It should be noticed that block diagonal matrix $A_{MV}(\bar{δ})$ consists of $A(δ_{Voxel})$ from individual voxels along its diagonal, with no off diagonal term. The matrix D is just a first difference operator.
Here, the expression in Eqn [B] is a L2-norm of a linear function in $\bar{x}$ and hence, a global convergence is guaranteed for this expression.
On the other hand, though the second term of Eqn [A] involves L2-norm of a linear function of $\bar{δ}$, the first term involves a L2-norm of a nonlinear function of $\bar{δ}$ and hence, one would expect the solution to converge to a local minimum. However, if we consider single voxel equivalent of first term i.e. $f_{voxel} = ‖A(δ_{voxel})x - y‖^2$, then its first derivative changes sign only once and the second derivative is positive at that point (refer the figure). Correspondingly, we find only one minima near δ0 (solution for no noise case). In other words, the function $f_{voxel}$ is unimodal, though it’s not convex everywhere. Consequently, we can conclude that the corresponding multi voxel equivalent i.e. $‖A(\bar{δ})\hat{x}^{{(n-1)}} - y‖^2$ is a multi-variate uni-modal with only one minimum and hence expression in [A] is also guaranteed to converge to globally.
