I am trying to understand a few things about sequential and simultaneous optimization in [1]. In this post, it is shown that
$$\max_{x} \max_{y} f(x,y) = \max_{x,\ y} f(x,y).\tag{1}$$
Thanks to @Shiv Tavker comment, I understand that in order to get the optimal value $z^*=(x^*, y^*)$ in the RHS of $(1)$, we have to solve the system $\nabla_z f(z) = 0$ w.r.t. $z$, where $z = (x,y)$.
In addition, from the LHS of $(1)$ we have, $${x^*}' = \arg \max_x f(x, y)\tag{2}$$ and $${y^*}' = \arg \max_y f({x^*}', y)\tag{3}.$$ Let ${z^*}' = ({x^*}', {y^*}')$.
As far as I understand, I think that given $(1)$, we can state that ${z^*}' \equiv {z^*}$. However, I am thinking if there are any cases that ${z^*}' \equiv {z^*}$ does not hold? Could you please someone give some comments or an answer of things are not so simple? Any help is highly appreciated.
EDIT1: Let $$\mathcal{f}(x, y) = -\frac{1}{2}\:\mathbf{z}^T \left(\mathbf{A} + \frac{xy}{2} \:\mathbf{I}\right)^{-1} \mathbf{z} - \frac{y}{6} \lambda - \frac{y}{12}x^3,$$ where $x\geq 0$, $y,\lambda > 0$, $\mathbf{A}$ a real symmetric positive semmi-definite, and $\mathbf{z}$ are fixed.
EDIT2: Let $\mathbf{t}(x,y) = - (\mathbf{A} + 0.5 x y\: \mathbf{I})^{-1} \mathbf{z}$. If we first solve $\partial_x f(x,y) = 0$ we get $$x = \| \mathbf{t}(x,y)\|\tag{4},$$ for $y >0$. Then, if we solve $\partial_y f(x,y) = 0$ and use $(4)$ we get $$\sqrt[3]{\lambda} = \| \mathbf{t}(x,y)\|.\tag{5}$$
Next, suppose that we first solve $x = \| \mathbf{t}(x,y)\|$ w.r.t. $x$ to get an optimal ${x^*}'$ and then solve $\sqrt[3]{\lambda} = \| \mathbf{t}({x^*}', y)\|$ w.r.t. $y$ to get ${y^*}'$. Can we say that ${z^*}' \equiv {z^*}$?
$$\mathcal{f}(x, y) = -\frac{1}{2}:\mathbf{z}^T \left(\mathbf{A} + \frac{xy}{2} :\mathbf{I}\right)^{-1} \mathbf{z} - \frac{y}{6} \lambda - \frac{y}{12}x^3,$$
where $\lambda$, $\mathbf{A}$, and $\mathbf{z}$ are fixed. I was hopping for a more general question that is why I didn't mention it in the main post. Any help is highly appreciated.
– darkmoor Mar 17 '22 at 13:59