4

I am trying to understand a few things about sequential and simultaneous optimization in [1]. In this post, it is shown that

$$\max_{x} \max_{y} f(x,y) = \max_{x,\ y} f(x,y).\tag{1}$$

Thanks to @Shiv Tavker comment, I understand that in order to get the optimal value $z^*=(x^*, y^*)$ in the RHS of $(1)$, we have to solve the system $\nabla_z f(z) = 0$ w.r.t. $z$, where $z = (x,y)$.

In addition, from the LHS of $(1)$ we have, $${x^*}' = \arg \max_x f(x, y)\tag{2}$$ and $${y^*}' = \arg \max_y f({x^*}', y)\tag{3}.$$ Let ${z^*}' = ({x^*}', {y^*}')$.

As far as I understand, I think that given $(1)$, we can state that ${z^*}' \equiv {z^*}$. However, I am thinking if there are any cases that ${z^*}' \equiv {z^*}$ does not hold? Could you please someone give some comments or an answer of things are not so simple? Any help is highly appreciated.

EDIT1: Let $$\mathcal{f}(x, y) = -\frac{1}{2}\:\mathbf{z}^T \left(\mathbf{A} + \frac{xy}{2} \:\mathbf{I}\right)^{-1} \mathbf{z} - \frac{y}{6} \lambda - \frac{y}{12}x^3,$$ where $x\geq 0$, $y,\lambda > 0$, $\mathbf{A}$ a real symmetric positive semmi-definite, and $\mathbf{z}$ are fixed.

EDIT2: Let $\mathbf{t}(x,y) = - (\mathbf{A} + 0.5 x y\: \mathbf{I})^{-1} \mathbf{z}$. If we first solve $\partial_x f(x,y) = 0$ we get $$x = \| \mathbf{t}(x,y)\|\tag{4},$$ for $y >0$. Then, if we solve $\partial_y f(x,y) = 0$ and use $(4)$ we get $$\sqrt[3]{\lambda} = \| \mathbf{t}(x,y)\|.\tag{5}$$

Next, suppose that we first solve $x = \| \mathbf{t}(x,y)\|$ w.r.t. $x$ to get an optimal ${x^*}'$ and then solve $\sqrt[3]{\lambda} = \| \mathbf{t}({x^*}', y)\|$ w.r.t. $y$ to get ${y^*}'$. Can we say that ${z^*}' \equiv {z^*}$?

darkmoor
  • 793
  • What would $\max_y f(x,y)$ mean? – podiki Nov 01 '21 at 21:27
  • 1
    Yes, you can set the derivative to be zero for both of the variables simultaneously. Note that.. this is same as setting gradient, $\nabla f = 0$. However, beware you may end up in local optimas. – Shiv Tavker Nov 02 '21 at 11:54
  • 1
    Do you mean: $${x^}'(y) = \arg \max_x f(x, y) \qquad (2)$$ $${y^}' = \arg \max_y f({x^}'(y), y) \qquad (3).$$ Let ${z^}' = ({x^}'({y^}'), {y^*}')$. – River Li Mar 17 '22 at 12:07
  • @RiverLi thanks for the interest. Yes this is my case. The solution ${x^*}'$ depends on $y$. – darkmoor Mar 17 '22 at 13:36
  • @RiverLi actually my equation is

    $$\mathcal{f}(x, y) = -\frac{1}{2}:\mathbf{z}^T \left(\mathbf{A} + \frac{xy}{2} :\mathbf{I}\right)^{-1} \mathbf{z} - \frac{y}{6} \lambda - \frac{y}{12}x^3,$$

    where $\lambda$, $\mathbf{A}$, and $\mathbf{z}$ are fixed. I was hopping for a more general question that is why I didn't mention it in the main post. Any help is highly appreciated.

    – darkmoor Mar 17 '22 at 13:59
  • 1
    @darkmoor I think you should go straight to the point (your $f(x, y)$) in the question. – River Li Mar 17 '22 at 14:05
  • @darkmoor Is $A$ real symmetric positive definite? Is $\lambda > 0$? Is the domain is $(0, \infty)^2$? Do you want to solve the optimization problem, or want to analyze if ${z^}' \equiv {z^}$ as you said in the question? – River Li Mar 17 '22 at 14:54
  • @RiverLi thanks again for the interest, I did some edits in my post. Generally I am trying to solve $\max_{x,y} : f(x,y)$ by using $(1)$. That is by solving it sequentially which I fond more easy that solving it simultaneously which I found more difficult. So at first place I am interested to know if ${z^}' \equiv {z^}$. In any case, I am very grateful for any additional information. – darkmoor Mar 17 '22 at 15:29
  • @RiverLi I add some more details in my post. I hope it is more clear now of what I am asking. – darkmoor Mar 17 '22 at 16:08
  • @podiki $\max_y f(x,y)$ is the maximum with respect to $y$ for a generic fixed $x$. It is therefore a function of $x$. – PierreCarre Mar 18 '22 at 16:27

1 Answers1

3

If $A$ is positive definite then $A=X^TDX$, where $D$ is a diagonal matrix with the eigenvalues $\lambda_i$ of $A$. Hence $$z^T\left(A+\frac{xy}{2}I\right)^{-1}z=(Xz)^T\begin{pmatrix} \lambda_1+\frac{xy}{2} & 0&\cdots&0 \\ 0 &\lambda_2+\frac{xy}{2}&\cdots&0 \\\vdots&\vdots&\ddots&0\\0&0&\cdots\lambda_n+\frac{xy}{2} \end{pmatrix}^{-1}(Xz)=\sum_{i=1}^nw_i^2\left(\lambda_i+\frac{xy}{2}\right)^{-1},$$ with $w=Xz$.

Otherwise, if you denote $$M=A+\frac{xy}{2}I,$$ you can see that $MM^{-1}=I$ and follows that $$\frac{\partial\,M}{\partial\,x}M^{-1}+M\frac{\partial\,M^{-1}}{\partial\,x}=0.$$ You find that $$\frac{\partial\,M^{-1}}{\partial\,x}=-\frac{y}{2}\left(M^{-1}\right)^2.$$ Hence, if $$\frac{\partial\,f(x,y)}{\partial\,x}=\frac{y}{4}z^T\left(M^{-1}\right)^2z-\frac{y}{4}x^2=0.$$ The condition on $y>0$ gives you $$x^2=z^T\left(M^{-1}\right)^2z.$$

If you also have $$\frac{\partial\,f(x,y)}{\partial\,y}=\frac{x}{4}z^T\left(M^{-1}\right)^2z-\frac{\lambda}{6}-\frac{1}{12}x^3=0.$$

It follows that $$\frac{x}{4}x^2-\frac{\lambda}{6}-\frac{1}{12}x^3=0.$$ And you have $x^*=\sqrt[3]{\lambda}$.

This helps you to handle with the equation $\nabla f(x,y)=(0,0)$.

Remark:

  1. Equation (2) probably gives you implicit curves, say $X(x,y)=0$, hence your equation (3) looks to becomes a constrained optimization problem, which can be harder to solve (this thread can be useful). If this last problem has only one solution then this thread imply that it is the one you are looking for.

  2. In a general case equations $\partial_x f(x,y) = 0$ and $\partial_y f(x,y) = 0$ probably gives implicit curves to you, say $u(x,y)=0$ and $v(x,y)=0$. This gives you a system of equations, with two equations and two variables, that you can try to solve numerically, like with Newton’s method or other that you can find on SearchOnMath.

  3. The set of solutions of $X(x,y)=0$ is a subset of the set of solution of $u(x,y)=0$.

  • Thanks for the interest. Could you please elaborate a little more on how this helps to handle the equation $\nabla f(x,y)=(0,0)$ for entire $f(x,y)$? – darkmoor Mar 17 '22 at 17:51
  • Indeed, I think that the equation $\nabla f(x,y)=(0,0)$ can be changed to a system of two equations, or two polynomial equations, in two variable. Probably you will need a numerical method to solve them. – José C Ferreira Mar 17 '22 at 18:03
  • Could you provide a more detailed answer on the whole problem or elaborate on ${z^}' \equiv {z^}$? – darkmoor Mar 17 '22 at 18:18
  • I did some change in my answer. I am trying to find an example to clarify my point. – José C Ferreira Mar 17 '22 at 23:59
  • In this example I want to show that $\min_xf(x,y)$ is a number, in this case is $0$. But the $\arg\min_xf(x,y)$ gives you the set $E$ containing both implicit curves $x^2+y^2-1=0$ and $x^2+xy-4=0$. On the other side, the equation $\partial_xf(x,y)=0$ has a bigger set of possible solution. – José C Ferreira Mar 18 '22 at 13:35
  • Why $X(x, y) = 0$ is a function of $x$ and $y$ and not only a function of $y$ as the optimal $x^*$ is fixed? Also, why the aforementioned analysis helps to deal with the simultaneous optimization $\nabla f(x,y)=(0,0)$? Last by not least, could you please elaborate a little more on the Remark 3.? – darkmoor Mar 21 '22 at 11:13
  • I understand that your equation (2) is in a general context. So the aforementioned example show you that $X(x,y)=0$ can gives you more that one implicit curve. And even one impicit curves can gives you more then one function, say $x(y)$. And, in general, $x^*$ is unknown at this point. – José C Ferreira Mar 21 '22 at 11:38
  • In general, the equation $\nabla f(x,y)=(0,0)$ is a way to find possible local minimizing points. It can gives you saddle point, local minimum or maximal points, for instance. – José C Ferreira Mar 21 '22 at 11:43
  • But, in your particular function given by EDIT1, equation $\nabla f(x,y)=(0,0)$ gives you only one $x=x^*=\sqrt[3]{\lambda}$. It remain to find possible values to $y$ in the equation $(\sqrt[3]{\lambda})^2=z^T\left((A+\frac{\sqrt[3]{\lambda}y}{2}I)^{-1}\right)^2z$ and test it to exclude possible non minimal points of $f(x,y)$. @darkmoor – José C Ferreira Mar 21 '22 at 11:54
  • So for some $\lambda$, we can compute $x=x^*=\sqrt[3]{\lambda}$ and then solve somehow the non-linear equation? Will we take the same solution if we have solved somehow $\nabla f(x,y)=(0,0)$ simultaneously? – darkmoor Mar 21 '22 at 12:02
  • Yes. You can think about the hessian matrix to test local minimum points. – José C Ferreira Mar 21 '22 at 12:04
  • Unfortunately, I do not catch your comment. Can you please be a little more precise on my two comment questions? Thanks. – darkmoor Mar 21 '22 at 12:07