0

The linear ridge regression loss function: $$ J(\beta)=\Sigma_{i=1}^n(x_i^T\beta-y_i)^2+\lambda\Sigma_{j=1}^p\beta_j^2= \Vert X\beta-Y \Vert^2 + \lambda\beta^TI\beta \text{ (matrix form)} $$ where $x_i$'s are the input vectors, $y_i$'s are the outputs (observations), $\beta$ is the vector of coefficients, and $\beta_j$'s are the elemenents of $\beta$, has the solution: $$ \hat{\beta}=(X^TX+\lambda I)^{-1}X^TY $$

On the other hand, in my textbook, it is said that by setting the derivative of $J(\beta)$ to $0$, we can obtain the solution $\hat{\beta}$ of the form: $$ \hat{\beta}=\Sigma_{i=1}^n \alpha_ix_i \tag{*} $$ where: $$ \alpha_i=\frac{-1}{\lambda}(x_i^T\beta-y_i) $$

How do we obtain (*)?

Ahmad Bazzi
  • 12,076
  • @AhmadBazzi yes, that’s actually what is stated in the textbook. So I don’t think there is a mistake. – A Slow Learner Mar 27 '20 at 09:25
  • @AhmadBazzi It's a fixed-point equation for the solution $\beta$. See my answer below. I think the confusion here is due to the fact that the notations $\beta$ and $\hat{\beta}$ are used in the same equation. – dohmatob Mar 27 '20 at 09:27

2 Answers2

-1

Since $J(\beta) = \frac{1}{2}\sum_{i=1}^n(x_i^T\beta-y_i)^2 + \frac{1}{2}\lambda\beta^T\beta$, we have $\nabla J(\beta)=\sum_{i=1}^n(x_i^T\beta-y)x_i + \lambda \beta$. Thus setting $\nabla J(\beta)$ to zero gives $\sum_{i=1}^n(x_i^T\beta-y)x_i + \lambda \beta = 0$, that is $\beta = -\frac{1}{\lambda}\sum_{i=1}^n(x_i^T\beta-y)x_i$, a fixed-point equation for the coefficients $\beta$.

dohmatob
  • 9,535
-1

Setting the derivative, we get $$2\sum\limits_{i=1}^n(x_i^T \beta - y_i)x_i + 2 \lambda \beta = 0$$ Expressing this first order condition in fixed point, we arrive at the desired result $$\hat{\beta} = \sum\limits_{i=1}^n\underbrace{-\frac{1}{\lambda}(x_i^T \beta - y_i)}_{\alpha_i}x_i $$

Ahmad Bazzi
  • 12,076
  • Well, this is exactly the same answer I posted earlier. what's the point of repeating it here as a separate answer ? – dohmatob Mar 27 '20 at 09:31
  • What’s the point of writing $\beta$ under this form? – A Slow Learner Mar 27 '20 at 09:42
  • @ASlowLearner well if you start from some initial point $\beta$ you could solve this recursively as $$\hat{\beta}^{(n+1)} = \sum\limits_{i=1}^n\underbrace{-\frac{1}{\lambda}(x_i^T \beta^{(n)} - y_i)}_{\alpha_i}x_i $$ This may turn out faster than direct computation in matrix form since we do not have a matrix inversion here or division (other than the division by $\lambda$ which is computed once) – Ahmad Bazzi Mar 27 '20 at 09:44