Often we would like to solve an optimization problem such as $$ \text{minimize} \quad f(x) + \alpha \| x \|_0, $$ where the optimization variable is $x \in \mathbb R^n$, $f:\mathbb R^n \to \mathbb R$ is convex, $\alpha > 0$, and $ \| x \|_0$ is the number of nonzero components of $x$. Unfortunately, this optimization problem is non-convex. So, as a heuristic, people instead solve the problem $$ \text{minimize} \quad f(x) + \beta \| x \|_1, $$ where the "$\ell_0$-norm" is replaced by the $\ell_1$-norm. (Here $\beta > 0$.) This heuristic approach often gives very good results.
Question: Under what conditions is it guaranteed that any minimizer for the second problem is also a global minimizer for the first problem? When is it guaranteed that any minimizer for the second problem has the same sparsity pattern as the global minimizer for the first problem (assuming a global minimizer for the first problem exists and is unique)?
I think this question has been studied extensively in compressed sensing theory, and I'd like to know what kinds of results are available.
(By the way, of course, the "$\ell_0$-norm" is not really a norm.)
