0

Could someone please explain to me why exactly linear regression is considered ill-posed?

Susan
  • 3
  • 3
    This post is somewhat ill-posed. For instance, it would be better with some context. Where have you heard someone claim that linear regression is ill-posed? In what setting? – Arthur Nov 02 '20 at 12:54

1 Answers1

0

According to Wikipedia, a problem is well-posed if:

  1. a solution exists,
  2. the solution is unique,
  3. the solution's behaviour changes continuously with the initial conditions.

Now suppose that you have a dataset $(X, y)$ consisting of samples $(x_i, y_i)_{i = 1}^m$, with $x_i \in \mathbb{R}^n$, $y_i \in \mathbb{R}$. I suppose that you think of linear regression as of the following problem for weights $w$ of the regression: $X w = y$ (or something similar depending on how do you stack samples in the matrix).

This problem is ill-posed because of the following reasons:

  • the solution is not unique if you have too few samples, namely $m < n$ (for example, there are infinitely many straight lines going through a given point in $\mathbb{R}^2$);
  • the solution may not exist if you have too many samples, namely $m > n$ (for example, there are not straight lines going through three given points in $\mathbb{R}^2$);
  • finally, there is multicollinearity issue which contradicts the last necessary condition of weight estimates being continuous in observations.

However, the problem can be adapted to fix these properties. Modifying the problem to $\|Xw - y\|_2^2 \to \min$ allows obtaining a pseudo-solution when there is no exact solution (the problem is still ill-posed because there may be many solutions).

You can also add Tikhonov regularization (a.k.a. ridge in the setting of regression) to get $\|Xw-y\|_2^2 + \lambda \|w\|_2^2 \to \min$ which allows selecting one solution if there are many solutions of the original problem.