I am wondering if I can minimize a multi-variable function one variable at a time. In other words, is it true that:
$min_{x_1,x_2} f(x_1,x_2)=min_{x_1} min_{x_2} f(x_1,x_2)$
I am wondering if I can minimize a multi-variable function one variable at a time. In other words, is it true that:
$min_{x_1,x_2} f(x_1,x_2)=min_{x_1} min_{x_2} f(x_1,x_2)$
Considering that you want to minimize $$f(x_1, x_2)$$ at the solution would be satisfied the two equations $$\frac{df(x_1, x_2)}{dx_1}=\frac{df(x_1, x_2)}{dx_2}=0$$ In order the method works you then need that the derivative with respect to to $x_1$ does not depend on $x_2$ and that the derivative with respect to $x_2$ does not depend on $x_1$. In other words, this would work if $$\frac{d^2f(x_1, x_2)}{dx_1\,dx_2}=0$$ You could be amazed by the fact that $55$ years ago, when I started with scientific computing (the lagrest computer at that time had probably less power than a cell phone today !), this was one method which was used but iteratively (even for many variables, hoping that the minimum would be unique).
The procedure was :
The equality holds for every possible case whenever $x$ and $y$ are independent i.e.
\begin{equation}\label{eq:0} \min_{y}\min_{x} f(x,y) = \min_{x,y} f(x,y) \end{equation}
Proof: \begin{equation}\label{eq:1} f(x^*,y^*)=\min_{x,y}f(x,y) \leq \min_{y}\min_{x}f(x,y)\,\,(\text{this holds trivially})\tag{1} \end{equation} where $(x^*, y^*) = \text{argmin}_{x,y}f(x,y)$.
Now, \begin{equation}\label{eq:2} \min_y\min_x f(x,y) \leq \min_y f(x^*,y)\tag{2} \end{equation} because $\min_x f(x,y) \leq f(x^*, y) \, \forall y$
and also
\begin{equation}\label{eq:3} \min_y f(x^*,y) \leq f(x^*,y^*) \, \text{(holds trivially)}\tag{3} \end{equation}
Now $\eqref{eq:2}$ and $\eqref{eq:3}$ implies \begin{equation}\label{eq:4} \min_y\min_x f(x,y) \leq f(x^*,y^*)\tag{4} \end{equation}
Overall \eqref{eq:1} and \eqref{eq:4} implies \begin{equation} \min_{y}\min_{x} f(x,y) = \min_{x,y} f(x,y) \end{equation}
Please note that in this proof I haven't used any restrictions of convexity or smoothness. Thus the claim is true for all circumstances.
Under some circumstances, yes.
Take $x_1$ fixed, and determine, as a function of $x_1$, that $x_2$ which minimizes the one variable function, $$f(x_1, x_2)$$ Assume that this will give you some $x_2^\star(x_1)$ for all $x_1$ in the reals. Now, consider the function $$p(x_1) = f(x_1,x_2^\star(x_1))$$ This is another single variable function, and we again assume that it can be minimized. Let $x_1^\star$ be the optimum. Then, $(x_1^\star, x_2^\star(x_1^\star))$ is optimum for your original function, because $$f(x_1,x_2) \ge f(x_1,x_2^\star(x_1)) = p(x_1) \ge p(x_1^\star) \ge f(x_1^\star, x_2^\star(x_1^\star))$$
Example:
Consider $f(x,y) = -x + y^2 + \frac{1}{3}x^2$. For fixed $x$, if we wish to minimize, it is best to choose $y = 0$. So, consider $p(x_1) = f(x_1,0) = -x + \frac{1}{3}x^2$. This is minimized at $x=3/2$. Thus, (3/2, 0) minimizes the original function.
The answer by Jaood is correct. Moreover, if we assume $F(x,y)\in C^{2}$ with $x\in X,y\in Y, F_{xx}\neq 0, \arg\min_{(x,y) \in X\times Y}F(x,y)\in\text{int}X\times Y$, we can also show the equivalence in the first-order conditions (FOCs) and second-order conditions (SOCs) between the "nested" and "joint" approaches:
$\textit{Equivalence in FOCs:}$
We see this by the implicit function theorem since if $F_{x}(x,y)=0\implies x=x^{*}(y)$, then we have that $x^{*'}(y)=-\frac{F_{xy}(x^{*}(y),y)}{F_{xx}(x^{*}(y),y)}$, which is well-defined by the assumption $F_{xx}\neq0$. The FOC of $F(x^{*}(y),y)$ wrt $y$ is then given by
$0=\partial_{y}F(x^{*}(y),y)=F_{x}(x^{*}(y),y)x^{*'}(y)+F_{y}(x^{*}(y),y)=F_{y}(x^{*}(y),y)$,
where the second equality follows from the chain rule and the third equality by construction since $F_{x}(x^{*}(y),y)=0$. Thus, we obtain $F_{y}(x,y)=0$ with $x=x^{*}(y)$ from the nested approach, equivalent to $F_{y}(x,y)=0$ and $F_{x}(x,y)=0$ from the joint approach.
$\textit{Equivalence in SOCs:}^1$
First observe that by the assumption that $F(x,y)\in C^{2}$, we have equality of mixed partials:$F_{xy}(x^{*}(y),y)=F_{yx}(x^{*}(y),y)$. Now, the SOC to ensure a minimum via the nested approach is that $F_{xx}(x^{*}(y),y)>0\ \forall y\in Y$ and $0<\partial_{y}^{2}F(x^{*}(y),y)|_{y=y^{*}}=\partial_{y}F_{y}(x^{*}(y),y)|_{y=y^{*}}=F_{yx}(x^{*}(y),y)x^{*'}(y)+F_{yy}(x^{*}(y),y)|_{y=y^{*}}=-\frac{(F_{xy}(x^{*}(y^{*}),y^{*}))^{2}}{F_{xx}(x^{*}(y^{*}),y^{*})}+F_{yy}(x^{*}(y^{*}),y^{*}),$
where the penultimate equality results from the chain rule and the final equality from the equality of mixed partials. Thus, we obtain $F_{xx}(x,y)>0$ for $x=x^{*}(y),\forall y\in Y$ and $-\frac{(F_{xy}(x,y))^{2}}{F_{xx}(x,y)}+F_{yy}(x,y)>0$ for $y=y^{*},x=x^{*}(y^{*})$ from the nested approach, which implies the SOC of the joint approach using the Hessian: $F_{xx}(x,y)>0, F_{xx}(x,y)F_{yy}(x,y)-(F_{xy}(x,y))^{2}>0$ at the minimizer.
1: In fact, it seems the nested approach has collectively stronger SOCs if one checks the SOC for each nested minimization. Particularly, convexity in the $x$ direction (at the optimal $x^{*}(y))$ from the inner minimization holding for every $y\in Y$ is stronger than convexity in the $x$ direction occurring only at the minimizer.