Solution Form of a Ridge Regression Problem

Question

The linear ridge regression loss function: $$ J(\beta)=\Sigma_{i=1}^n(x_i^T\beta-y_i)^2+\lambda\Sigma_{j=1}^p\beta_j^2= \Vert X\beta-Y \Vert^2 + \lambda\beta^TI\beta \text{ (matrix form)} $$ where $x_i$'s are the input vectors, $y_i$'s are the outputs (observations), $\beta$ is the vector of coefficients, and $\beta_j$'s are the elemenents of $\beta$, has the solution: $$ \hat{\beta}=(X^TX+\lambda I)^{-1}X^TY $$

On the other hand, in my textbook, it is said that by setting the derivative of $J(\beta)$ to $0$, we can obtain the solution $\hat{\beta}$ of the form: $$ \hat{\beta}=\Sigma_{i=1}^n \alpha_ix_i \tag{*} $$ where: $$ \alpha_i=\frac{-1}{\lambda}(x_i^T\beta-y_i) $$

How do we obtain (*)?

@AhmadBazzi yes, that’s actually what is stated in the textbook. So I don’t think there is a mistake. — A Slow Learner, Mar 27 '20 at 09:25
@AhmadBazzi It's a fixed-point equation for the solution $\beta$. See my answer below. I think the confusion here is due to the fact that the notations $\beta$ and $\hat{\beta}$ are used in the same equation. — dohmatob, Mar 27 '20 at 09:27

dohmatob · Answer 1 · 2020-03-27T09:30:14.463

-1

Since $J(\beta) = \frac{1}{2}\sum_{i=1}^n(x_i^T\beta-y_i)^2 + \frac{1}{2}\lambda\beta^T\beta$, we have $\nabla J(\beta)=\sum_{i=1}^n(x_i^T\beta-y)x_i + \lambda \beta$. Thus setting $\nabla J(\beta)$ to zero gives $\sum_{i=1}^n(x_i^T\beta-y)x_i + \lambda \beta = 0$, that is $\beta = -\frac{1}{\lambda}\sum_{i=1}^n(x_i^T\beta-y)x_i$, a fixed-point equation for the coefficients $\beta$.

edited Mar 27 '20 at 09:30

answered Mar 27 '20 at 09:25

dohmatob

9,535

Why do we want that fixed-point equation instead of the explicit solution for $\beta$? – A Slow Learner Mar 27 '20 at 09:33
Well, your comment doesn't make much sense. You seem to be just out to chase the points. Have a good day! – dohmatob Mar 27 '20 at 09:42

score -1 · Answer 2 · answered Mar 27 '20 at 09:29

-1

Setting the derivative, we get $$2\sum\limits_{i=1}^n(x_i^T \beta - y_i)x_i + 2 \lambda \beta = 0$$ Expressing this first order condition in fixed point, we arrive at the desired result $$\hat{\beta} = \sum\limits_{i=1}^n\underbrace{-\frac{1}{\lambda}(x_i^T \beta - y_i)}_{\alpha_i}x_i $$

answered Mar 27 '20 at 09:29

Ahmad Bazzi

12,076

Well, this is exactly the same answer I posted earlier. what's the point of repeating it here as a separate answer ? – dohmatob Mar 27 '20 at 09:31
What’s the point of writing $\beta$ under this form? – A Slow Learner Mar 27 '20 at 09:42
@ASlowLearner well if you start from some initial point $\beta$ you could solve this recursively as $$\hat{\beta}^{(n+1)} = \sum\limits_{i=1}^n\underbrace{-\frac{1}{\lambda}(x_i^T \beta^{(n)} - y_i)}_{\alpha_i}x_i $$ This may turn out faster than direct computation in matrix form since we do not have a matrix inversion here or division (other than the division by $\lambda$ which is computed once) – Ahmad Bazzi Mar 27 '20 at 09:44

Solution Form of a Ridge Regression Problem

2 Answers2