Minimize multi-variable function one variable at a time

Question

I am wondering if I can minimize a multi-variable function one variable at a time. In other words, is it true that:

$min_{x_1,x_2} f(x_1,x_2)=min_{x_1} min_{x_2} f(x_1,x_2)$

see also this question: https://math.stackexchange.com/questions/453831/optimization-of-a-function-of-two-variables — Chill2Macht, Sep 03 '18 at 23:46
@Tom Bennett: The current accepted answer by Jaood is incorrect. Please change it for the benefit of general readers. — Akshay Bansal, Jul 29 '19 at 16:00
@TomBennett: Actually the equality holds for all possible circumstances but Jaood's answer denies that fact and states that under only certain circumstances the equality is true. — Akshay Bansal, Jul 30 '19 at 02:29
That is true. He said that. But his actual proof seems to work under all circumstances. — Tom Bennett, Jul 30 '19 at 18:54

score 4 · Answer 1 · answered May 22 '15 at 09:29

4

Considering that you want to minimize $$f(x_1, x_2)$$ at the solution would be satisfied the two equations $$\frac{df(x_1, x_2)}{dx_1}=\frac{df(x_1, x_2)}{dx_2}=0$$ In order the method works you then need that the derivative with respect to to $x_1$ does not depend on $x_2$ and that the derivative with respect to $x_2$ does not depend on $x_1$. In other words, this would work if $$\frac{d^2f(x_1, x_2)}{dx_1\,dx_2}=0$$ You could be amazed by the fact that $55$ years ago, when I started with scientific computing (the lagrest computer at that time had probably less power than a cell phone today !), this was one method which was used but iteratively (even for many variables, hoping that the minimum would be unique).

The procedure was :

Fix $x_2$ at a given value. Then, minimize $f(x_1)$ and get $x_1$
Keep $x_1$ at the value found in first step. Then, minimize $f(x_2)$ and get $x_2$
Go back to the first step as long as $x_2$ changes (up to a given tolerance)

answered May 22 '15 at 09:29

Claude Leibovici

260,315

Your algorithm is cool, but it seems different from mine. I am nesting two optimizers whereas yours iteratively minimizes the variables one at a time. Do you think with my approach the mixed partial has to be zero? – Tom Bennett May 22 '15 at 15:54
I must confess that I am afraid by the idea of nested optimizers !! Think about solving two equations $f(x_1,x_2)=0$, $g(x_1,x_2)=0$ with your method instead of a global (simultaneous) approach. If your problem corresponds to the condition of the cross derivative, you just need sequential. – Claude Leibovici May 22 '15 at 15:58
My particular use case of the nested optimizer is that the inner optimization can be done very cheaply, like least sq, or even analytically. Then the outer optimizer, which is numerical, can be much faster. – Tom Bennett May 22 '15 at 16:43
@ClaudeLeibovici If this minimization problem has constraints and considering I am solving problem with Lagrange multipliers, for the second step do I need to include constraints or is it just enough to include all conditions at first step while getting $x_1$ ? – Pumpkin Oct 15 '17 at 19:12

score 3 · Accepted Answer · answered Jul 29 '19 at 11:56

The equality holds for every possible case whenever $x$ and $y$ are independent i.e.

\begin{equation}\label{eq:0} \min_{y}\min_{x} f(x,y) = \min_{x,y} f(x,y) \end{equation}

Proof: \begin{equation}\label{eq:1} f(x^*,y^*)=\min_{x,y}f(x,y) \leq \min_{y}\min_{x}f(x,y)\,\,(\text{this holds trivially})\tag{1} \end{equation} where $(x^*, y^*) = \text{argmin}_{x,y}f(x,y)$.

Now, \begin{equation}\label{eq:2} \min_y\min_x f(x,y) \leq \min_y f(x^*,y)\tag{2} \end{equation} because $\min_x f(x,y) \leq f(x^*, y) \, \forall y$

and also

\begin{equation}\label{eq:3} \min_y f(x^*,y) \leq f(x^*,y^*) \, \text{(holds trivially)}\tag{3} \end{equation}

Now $\eqref{eq:2}$ and $\eqref{eq:3}$ implies \begin{equation}\label{eq:4} \min_y\min_x f(x,y) \leq f(x^*,y^*)\tag{4} \end{equation}

Overall \eqref{eq:1} and \eqref{eq:4} implies \begin{equation} \min_{y}\min_{x} f(x,y) = \min_{x,y} f(x,y) \end{equation}

Please note that in this proof I haven't used any restrictions of convexity or smoothness. Thus the claim is true for all circumstances.

If you assume $x$ and $y$ are independent, isn't it equivalent to assuming we can write $f(x,y)$ as $f_1(x) + f_2(y)$? Which would make the proof trivial? — durdi, Nov 24 '21 at 14:46
@durdi: In general it may not be possible to express $f(x,y)$ as $f_1(x) + f_2(y)$. For e.g. when $f(x,y) = xy$. — Akshay Bansal, Nov 24 '21 at 22:10
So what do you mean when you say "$x$ and $y$ are independent"? — durdi, Nov 25 '21 at 18:00

Jaood · Answer 3 · 2015-05-22T07:30:51.713

1

Under some circumstances, yes.

Take $x_1$ fixed, and determine, as a function of $x_1$, that $x_2$ which minimizes the one variable function, $$f(x_1, x_2)$$ Assume that this will give you some $x_2^\star(x_1)$ for all $x_1$ in the reals. Now, consider the function $$p(x_1) = f(x_1,x_2^\star(x_1))$$ This is another single variable function, and we again assume that it can be minimized. Let $x_1^\star$ be the optimum. Then, $(x_1^\star, x_2^\star(x_1^\star))$ is optimum for your original function, because $$f(x_1,x_2) \ge f(x_1,x_2^\star(x_1)) = p(x_1) \ge p(x_1^\star) \ge f(x_1^\star, x_2^\star(x_1^\star))$$

Example:

Consider $f(x,y) = -x + y^2 + \frac{1}{3}x^2$. For fixed $x$, if we wish to minimize, it is best to choose $y = 0$. So, consider $p(x_1) = f(x_1,0) = -x + \frac{1}{3}x^2$. This is minimized at $x=3/2$. Thus, (3/2, 0) minimizes the original function.

edited May 22 '15 at 07:30

answered May 22 '15 at 07:14

Jaood

1,391

4

When does it NOT work? – Tom Bennett May 22 '15 at 08:07
It depends on the function you are dealing with. It's not always possible (or easy) to determine an expression $x_2^\star(x_1)$ that you can plug in wherever you want.... – Jaood May 22 '15 at 08:37
2

Do you mean that this isn't always a viable solution method in practice? But in theory the equality is always correct? (Again, even if the expressions involved are difficult to impossible to evaluate in practice)? – Chill2Macht Sep 02 '18 at 01:16
1

@Jaood: Downvoted because of the mis-information that it holds "only in certain circumstances". – Akshay Bansal Jul 31 '19 at 06:18

score 1 · Answer 4 · answered Jul 27 '19 at 03:13

The answer by Jaood is correct. Moreover, if we assume $F(x,y)\in C^{2}$ with $x\in X,y\in Y, F_{xx}\neq 0, \arg\min_{(x,y) \in X\times Y}F(x,y)\in\text{int}X\times Y$, we can also show the equivalence in the first-order conditions (FOCs) and second-order conditions (SOCs) between the "nested" and "joint" approaches:

$\textit{Equivalence in FOCs:}$

We see this by the implicit function theorem since if $F_{x}(x,y)=0\implies x=x^{*}(y)$, then we have that $x^{*'}(y)=-\frac{F_{xy}(x^{*}(y),y)}{F_{xx}(x^{*}(y),y)}$, which is well-defined by the assumption $F_{xx}\neq0$. The FOC of $F(x^{*}(y),y)$ wrt $y$ is then given by

$0=\partial_{y}F(x^{*}(y),y)=F_{x}(x^{*}(y),y)x^{*'}(y)+F_{y}(x^{*}(y),y)=F_{y}(x^{*}(y),y)$,

where the second equality follows from the chain rule and the third equality by construction since $F_{x}(x^{*}(y),y)=0$. Thus, we obtain $F_{y}(x,y)=0$ with $x=x^{*}(y)$ from the nested approach, equivalent to $F_{y}(x,y)=0$ and $F_{x}(x,y)=0$ from the joint approach.

$\textit{Equivalence in SOCs:}^1$

First observe that by the assumption that $F(x,y)\in C^{2}$, we have equality of mixed partials:$F_{xy}(x^{*}(y),y)=F_{yx}(x^{*}(y),y)$. Now, the SOC to ensure a minimum via the nested approach is that $F_{xx}(x^{*}(y),y)>0\ \forall y\in Y$ and $0<\partial_{y}^{2}F(x^{*}(y),y)|_{y=y^{*}}=\partial_{y}F_{y}(x^{*}(y),y)|_{y=y^{*}}=F_{yx}(x^{*}(y),y)x^{*'}(y)+F_{yy}(x^{*}(y),y)|_{y=y^{*}}=-\frac{(F_{xy}(x^{*}(y^{*}),y^{*}))^{2}}{F_{xx}(x^{*}(y^{*}),y^{*})}+F_{yy}(x^{*}(y^{*}),y^{*}),$

where the penultimate equality results from the chain rule and the final equality from the equality of mixed partials. Thus, we obtain $F_{xx}(x,y)>0$ for $x=x^{*}(y),\forall y\in Y$ and $-\frac{(F_{xy}(x,y))^{2}}{F_{xx}(x,y)}+F_{yy}(x,y)>0$ for $y=y^{*},x=x^{*}(y^{*})$ from the nested approach, which implies the SOC of the joint approach using the Hessian: $F_{xx}(x,y)>0, F_{xx}(x,y)F_{yy}(x,y)-(F_{xy}(x,y))^{2}>0$ at the minimizer.

1: In fact, it seems the nested approach has collectively stronger SOCs if one checks the SOC for each nested minimization. Particularly, convexity in the $x$ direction (at the optimal $x^{*}(y))$ from the inner minimization holding for every $y\in Y$ is stronger than convexity in the $x$ direction occurring only at the minimizer.

Minimize multi-variable function one variable at a time

4 Answers4

Linked

Related