When is minimizing a functional over a subset equivalent to approximating its global minimizer?

Question

Let $\mathcal X$ be some compact subset of $\mathbb R^d$, and define $(\mathcal M(\mathcal X),\|\cdot\|) $ to be the space of bounded and real-valued measurable functions on $\mathcal X $, equipped with the supremum norm $\|\cdot\| $.

Let $\mathcal F \subset \mathcal M(\mathcal X)$ be a closed subset of functions defined on $\mathcal X$. For my purposes, I can't assume that $\mathcal F$ has much of a nice structure : it is not a vector space, and it is not convex either. But if that can help I am okay with initially relaxing this requirement.

Now, let $J: \mathcal M(\mathcal X)\to\mathbb R^+$ be a cost functional and assume we want to minimize it over the family of functions $\mathcal F$. In other words, we want to solve $$\min_{f\in\mathcal F}{J(f)} \tag 1$$ Furthermore, assume that there exists a unique function $f^*$ that minimizes $J$ over all functions in $\mathcal M(\mathcal X)$ : $$f^* := \arg\min_{f\in\mathcal M(\mathcal X)} J(f)$$

Intuitively, one would think that if $J$ is well-behaved enough (e.g. convex, smooth...), minimizing $J$ over $\mathcal F$ is equivalent to finding the function $f\in\mathcal F$ that best approximates $f^*$, i.e. that solving problem $(1)$ is equivalent to solving the following : $$\min_{f\in\mathcal F} \|f-f^*\| \tag2 $$ The two problems are clearly related (see this nice blog post which inspired this question), but are they actually equivalent ? More rigorously, I want to know under what (minimal) assumptions on $J$ the following statement is true : $$f\in\left\{\arg\min_{f\in \mathcal F} J(f)\right\}\iff f\in\left\{\arg\min_{f\in \mathcal F} \|f-f^*\|\right\} \tag3$$ I am especially interested in the direction $\Rightarrow$, though of course I will take any hints or references that would help for either direction.

My thoughts : I initially thought (hoped) that $J$ being convex would be sufficient for the result to hold, but it is actually not the case, as highlighted by the following counterexample :

Let $J$ be the "$1$-norm" : $$\|f\|_1\equiv J(f):=\begin{cases}\sup_{x\in\mathcal X} |f(x)| + |f'(x)| \text{ if } f\in C^1(\mathcal X),\\ +\infty \text{ otherwise}\end{cases} $$ Then clearly the global minimizer of $J$ is the $0$ function but it is possible to find functions with arbitrarily large $\|\cdot\|_1$ norm and arbitrarily small $\|\cdot\|_\infty$ norm. So, for instance, if I let $\mathcal X\equiv[0,1]\subseteq \mathbb R^1$, and let $$\mathcal F =\{x\mapsto\varepsilon\sin(Mx),x\mapsto (\varepsilon/2)\sin(M^2 x)\mid \varepsilon\in[1/2,1],M\ge10^6\} $$ Then the minimizer of $J$ over $\mathcal F$ is $\phi :x\mapsto \frac{1}{2}\sin(10^6 x)$, but the map $\varphi :x\mapsto \frac{1}{4}\sin(10^{12} x)\in\mathcal F$ is closer to the zero function in supremum norm.
In this example, $\mathcal F$ is particularly badly behaved, but I think similar counterexamples can be found for "nicer" sets too.

Either way, $J$ being convex is not enough, but I suspect that additional regularity assumptions such as Lipschitz continuity could do the trick. I haven't been able to prove it though...

I don't see any reason for something like this to be true without more assumptions on $J$. If $J$ isn't e.g. convex then it could take, say, the value $0$ on the minimizer (which is not in $F$), the value $1$ on some functions in $F$ close to the minimizer, and the value $\frac{1}{2}$ on some functions in $F$ further away from the minimizer. — Qiaochu Yuan, Aug 15 '22 at 11:41
+1. However, I don't think this is the right question to ask, even though the idea is clear. There are many mathematical details which are off. First, there is no norm on $\mathbb R^\mathcal{X}$, that's too big a space, a proper subspace must be chosen. I guess some Hilbert space such as $L^2(\mathcal X)$ is the right starting point, assuming $\mathcal X$ is a domain. Then, what is $\mathcal F$? If it is a closed subspace then something can be said. In full generality it seems really hard. — Giuseppe Negro, Sep 29 '22 at 14:07
@GiuseppeNegro thanks for your comment. You are right, the set of all functions is a bit too large. I chose that one because I wanted to put as little restrictions as possible on the optimal value $f^*$, but I realize that it's not practical to work with. I will edit my post. — Stratos supports the strike, Sep 29 '22 at 18:01
An obstacle of this kind appears already with integer linear programs. That is, the minimizer of a linear objective function over the integer lattice need not be the closest integer-valued point to the minimizer over the feasible region of $\mathbb R^n$. — hardmath, Nov 03 '22 at 14:25
@hardmath thanks for your comment. Indeed, I have realized now that the answer to this question is negative in most settings, unless very strong assumptions are made on $J$. I will edit my answer into a more definitive one later. — Stratos supports the strike, Nov 03 '22 at 15:41

Stratos supports the strike · Answer 1 · 2022-11-07T10:50:43.390

Here is an answer which, although not fully definitive, shows that one can't expect such a statement to hold in most settings :

Indeed, let $J$ be the $L^2$ distance to some function $f_0\in\mathcal M(\mathcal X)$, i.e. $$J(f) := \|f-f_0\|_2^2=\int_{\mathcal X} (f-f_0)^2 d\mu $$ (Note that $\mathcal M(\mathcal X) \subseteq L^2(\mathcal X)$ so $J$ is well-defined). $J$ is convex, Lipschitz and differentiable, and clearly its unique minimizer is $f^*\equiv f_0$. However the equivalence $(3)$ we are interested in then translates in this setting to $$\arg\min_{f\in\mathcal F}\big\{\|f-f_0\|_2\big\} = \arg\min_{f\in \mathcal F}\big\{\|f-f_0\|\big\} $$

But $\|\cdot\|$ is the uniform norm, so we know that this equivalence won't hold true in general for a convex set $\mathcal F$, and even less so if $\mathcal F$ is not convex.
Hence, even in "nice" settings, the equivalence won't hold.

However, if we're willing to make more, very strong assumptions on $J$ the equivalence may hold.
Namely, assume that :

$(A1)$ $J$ is twice-differentiable in the sense defined in Gelfand and Fomin's Calculus of Variations, i.e. that for all $f\in\mathcal M(\mathcal X) $ there exists a linear functional $\phi_1$ and a quadratic functional $\phi_2$ such that for all $h\in\mathcal M(\mathcal X)$ such that $f+h\in\mathcal M(\mathcal X) $ $$J(f+h)-J(f) = \phi_1(h) + \phi_2(h) + \varepsilon\|h\|^2 $$ With $\phi_2(h)\to 0 $ as $\|h\|\to0 $ and $\varepsilon\to 0 $ as $\|h\|\to0 $. ($\phi_1 $ and $\phi_2$ are respectively referred to as the first and second variations of $J$ at point $f$.)

$(A2)$ The second variation $\phi_2$ is strongly positive, which is defined in the same book as $\phi_2$ satisfying $$\phi_2(h)\ge\kappa \|h\|^2 $$ For some constant $\kappa>0$.

$(A3)$ $J$ is Lipschitz continuous, i.e. that there exists $L>0$ such that for all $f_1,f_2\in\mathcal M(\mathcal X) $ $$|J(f_1) - J(f_2)|\le L \|f_1-f_2\| $$

$(A4)$ $J$ is what I call strongly coercive : for $\delta:=\inf_{f\in\mathcal F} \|f-f^*\|$, the following inequality holds $$\kappa \ge \frac{L}{\delta} $$

Then, we have the following

Claim : Under assumptions $(A1)-(A4)$, the following equivalence holds true $$f\in\left\{\arg\min_{f\in \mathcal F} J(f)\right\}\iff f\in\left\{\arg\min_{f\in \mathcal F} \|f-f^*\|\right\} $$

Sketch of proof : We prove the implication $\Rightarrow$ :
Let $\hat f \in \arg\min_{f\in\mathcal F}$, $f\in\mathcal F$ and let $h:=f^*-\hat f$. We have by the assumptions $(A1)$ and $(A2)$ that $$\begin{align} J(\hat f+h) - J(\hat f) = J(f^*) - J(\hat f) &=\phi_1(h) + \phi_2(h) + \varepsilon\|h\|^2 \\ &=0+ \phi_2(h) + \varepsilon\|h\|^2 \tag1 \\ &=0+ \tilde\phi_2(h) \tag2 \\ &\ge \kappa\|h\|^2 =\kappa\|f^*-\hat f\|^2 \end{align} $$ Where we have used in line $(1)$ that the first variation necessarily vanishes at an extremum (see Chapter 1, Theorem 2 of Gelfand's book), and a "functional mean value theorem" (see here) in line $(2)$. We get by combining the above inequality with assumption $(A3)$ and the fact that $\hat f$ is a minimizer that $$\kappa\|f^*-\hat f\|^2 \le |J(f^*)-J(\hat f)| \le |J(f^*) - J(f)|\le L\|f^* - f\| $$ Finally, the strong coercivity assumption $(A4)$ allows us to conclude that $$\|f^*-\hat f\|^2 \le\frac{L}{\kappa}\|f^*-f\| \le\|f^*- f\|^2 $$ Which is the desired conclusion. A similar argument shows the other direction as well.

Regarding the strength of the assumptions, I would say that Lipschitzness $(A3)$ and twice-differentiability $(A1)$ are rather reasonable, but the strong-positivity $(A2)$ is already quite difficult to satisfy for many functionals of practical interest. Assumption $(A4)$ stands out as particularly ridiculous, indeed, if the class $\mathcal F $ is large enough, one may expect that it will approximate $f^*$ well, i.e. that $\delta = \inf_{f\in\mathcal F} \|f-f^*\|$ will tend to vanish, which means that the lower bound on $\kappa$ (or the upper bound on $L$) is practically impossible to satisfy.

It would be interesting to see if and how assumptions $(A1)-(A4)$ can be relaxed, but in general, we can safely conclude that minimizing a functional is NOT equivalent to approximating its minimizer.

When is minimizing a functional over a subset equivalent to approximating its global minimizer?

1 Answers1