0

Matrix regression proof that $\hat \beta = (X' X)^{-1} X' Y = {\hat \beta_0 \choose \hat \beta_1} $

where $\beta$ is the least square estimator of $\hat\beta$ of $\beta$

attempt

So I know ${\hat \beta_0 \choose \hat \beta_1} = {\overline{Y} - \hat \beta_1 \overline{X} \choose \frac{\sum_{i=1}^{n} (X_i - \overline{X})(Y_i - \overline{X})}{\sum_{i=1}^{n}(X_i - \overline{X})^2}}$

Not really sure how to start as I don't know what formulas there are to reduce any of this. And if this was answered elsewhere please duplicate I was trying to search but couldn't

bob
  • 5
  • See e.g. here: https://stats.stackexchange.com/questions/46151/how-to-derive-the-least-square-estimator-for-multiple-linear-regression or https://stats.stackexchange.com/questions/186196/understanding-linear-algebra-in-ordinary-least-squares-derivation. – Minus One-Twelfth Jun 30 '19 at 08:08
  • The steps there are basically 1) Recall that the least squares estimator is chosen to minimise (with respect to $\beta$) the function $$S(\beta):= (y-X\beta)^T (y - X\beta);$$ 2) expand this to show that $$S(\beta) = y^T y - 2y^T X \beta + \beta^T X^T X \beta;$$ 3) use matrix calculus to find the $\beta$ that minimises this (calculate $\frac{\partial S}{\partial \beta}$, set to $\mathbf{0}$ and solve for $\beta$). – Minus One-Twelfth Jun 30 '19 at 08:13

2 Answers2

1

Our goal is to minimize $$ f(\beta) = \frac12 \| X \beta - Y \|^2. $$ Notice that $f = g \circ h$, where $h(\beta) = X \beta - Y$ and $g(u) = \frac12 \| u \|^2$. The derivatives of $g$ and $h$ are given by $$ g'(u) = u^T, \quad h'(\beta) = X. $$ By the chain rule, we have \begin{align} f'(\beta) &= g'(h(\beta)) h'(\beta) \\ &= (X \beta - Y)^T X. \end{align} The gradient of $f$ is $$ \nabla f(\beta) = f'(\beta)^T = X^T( X \beta - Y). $$ Setting the gradient of $f$ equal to $0$, we discover that $$ X^T X \beta = X^T Y. $$

littleO
  • 51,938
0

In a slight variant on @MinusOne-Twelfth's comment,$$\frac{\partial}{\partial\beta_i}(y-X\beta)_j=-X_{ji}\implies\frac{\partial}{\partial\beta_i}\sum_j(y-X\beta)_j^2=2\sum_jX_{ij}^T(X\beta-y)_j=2(X^\prime X\beta-X^\prime y)_i.$$Setting this to $0$ for all $i$,$$X^\prime X\beta=X^\prime y\implies\beta=(X^\prime X)^{-1}X^\prime y.$$

J.G.
  • 115,835
  • 1
    I'm confused how finding $\beta$ equals ${\beta_0 \choose \beta_1}$? – bob Jun 30 '19 at 23:04
  • @bob The easiest option is to double-check $X^{\prime}X\left(\begin{array}{c} \beta_{0}\ \beta_{1} \end{array}\right)=X^{\prime}y$. – J.G. Jul 01 '19 at 05:28