1

The classical example is a saddle point of the function $$f: \mathbb R^2 \to \mathbb R, \\ \min_x \max_y f(x, y) = \min_x \max_y xy$$ If one applies "gradient decent" to this problem, i.e. follows $g(x, y) = (-x, y)$, then the method would get stuck on the circular trajectory because that "gradient-like function" is "tangent". My question is, why does this happen? is there any value, like a conditional number of a function or something to do with eigenvalues of its Hessian, that can indicate that "gradient decent will not converge to the local\global saddle point in this case"? If a function is strictly convex in $x$ and strictly concave $y$ would that also be the case?

Brad S.
  • 1,876
Ben Usman
  • 413

1 Answers1

1

They're not inherently harder. Note that for the more general case $f(x,y)=g(x) + x^T A y - h(y)$, where $g$ and $h$ are convex, we have

$\inf_x \sup_y \big(g(x) + x^T Ay - h(y)\big) = \inf_x \big( g(x) + h^*(A^Tx)\big),$

where $h^*$ is the conjugate of $h$. So the problem can, in general, be reduced to the convex case. (This does not mean a gradient-type method will work well on it.)

For your problem $g(x) = 0$, $h(x) = 0$, and $A = 1$. The conjugate of $h^*(x)=0$ if $x=0$ and is $\infty$ otherwise. So the problem is $\inf_x \big( g(x) + h^*(A^Tx)\big) = \inf_{x=0} 0 = 0$.