1

We know that for a group of heavy collinear features, LASSO will mostly to select one of them and set the others zero. And many references said that this selected feature is randomly determined by LASSO. I am not sure for this conclusion and don't know the reason. Could you offer some related references?

1 Answers1

1

I think you are making a small confusion here.

For a fixed data vector $y$ and a matrix $A$, the number of solution to the LASSO problem, defined as

$$\min_x ||Ax-y||_2+\lambda ||x||_1$$

is either one or infinite. In both cases, the strongest feature (eventually plural) is never chosen at random. The process is entirely deterministic.

However, when computing the solution numerically, depending on the algorithm, the initialization and the tolerance required on the solution. The results might be different.

Furthermore, if the data vector is the result of an experiment, then randomness is involved in the determination of $y$. Assuming $A$ contains strongly colinear features, the result might be unstable as well so several outputs of the same experiment might yield different results.

As a side note, if it happens, the selection among the colinear features is unlikely to be uniform. Strictly speaking, being randomly selected does not necessarily means that the different outomes have equal probability to happen.

nicomezi
  • 8,254
  • so as a summery, the randomness is because of the instability of numerical algorithms of LASSO (e.g. modified LARS, proximal gradient descent, etc.), there is no specific theory to explain how randomness happens. – user6703592 Oct 02 '21 at 11:06
  • Algortihms are one part, noise on the data is also a reason. As far as I know, there is no particular theory on this, only numerical experiments (I will link a reference if I find one). The LASSO tend to minimize the number of used features to explain the data. If there are strongly colinear atoms in the dictionnary (columns of $A$), the choice of one of them may be highly sensitive to the data, resulting in instability, which can be modelled by randomness. – nicomezi Oct 03 '21 at 05:09
  • and I have an other related question, are you happy to have an answer? https://stats.stackexchange.com/questions/546179/how-does-coefficient-change-if-we-multiply-a-predictor-by-a-constant-in-lasso-ri?noredirect=1#comment1002740_546179 – user6703592 Oct 03 '21 at 10:16
  • I think the best way to answer that second question is to make some tests with a toy example. My intuition would be that a scaled predictor by $c>1$ will be more likely to be set to $0$ (so the same as you, if I have understood your explanations correctly) – nicomezi Oct 03 '21 at 11:00