0

I have a data set that I have used to calculate the coefficients for a linear regression. The data set is of the form $\lbrace x_i,y_i\rbrace_{i=1}^{n} $

Let $$Y = \alpha + \beta X + Z$$ where $\text{corr}(X,Z) = 0$ and $Z \sim N(0,\sigma_Z^2)$, with constant $\sigma_Z^2$

To calculate $\alpha$ and $\beta$, I had to assume $Z$ is zero. I then could find them by

$$\beta = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}$$ and assuming $Z=0$, then $$\alpha = \bar{y}-\beta \bar{x}$$I am fairly certain this assumption is correct since the numbers I got match the rest of the problem. However, I don't understand why I can assume this.

Why can I assume $Z=0$?

  • I don't know what you mean with OLS, does that mean that you have a precise model for $X, \alpha, \beta$ and that you will minimize some given objective function for estimating its parameters ? – reuns Apr 18 '16 at 06:23
  • no you didn't : how do you estimate $\alpha,\beta$ ? – reuns Apr 18 '16 at 06:46
  • okay, i updated again. see if it is satisfactory – Stan Shunpike Apr 18 '16 at 06:52
  • yes but how do you derive that $\alpha,\beta$ are given by those expressions ? – reuns Apr 18 '16 at 07:13
  • Unfortunately, I have no idea what you are asking. Or rather, that's not a question I'm in a position to answer, for if I could, I doubt I would be confused. – Stan Shunpike Apr 18 '16 at 07:18
  • in the usual linear regression, we are searching for $\alpha,\beta$ minimizing the objective function $J(\alpha,\beta) = \sum_{i=1}^n (y_i - (\alpha + \beta x_i))^2$ (the ordinary least square) that's where your formulas come from : $\beta = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2}$ and $\alpha = \bar{y}-\beta \bar{x}$ are the minimizers of $J$. – reuns Apr 18 '16 at 07:27
  • now replace $y_i$ by $y_i+Z$ (that's what you are doing by adding $Z$ to your model for $Y$) hence you get $(y_i + Z- (\alpha + \beta x_i))^2 = (y_i - (\alpha + \beta x_i))^2 + Z^2 + 2 Z(y_i - (\alpha + \beta x_i))$, and $Z^2$ doesn't depend on $\alpha,\beta$ hence it doesn't matter in the minimization, and $2 Z(y_i - (\alpha + \beta x_i))$ is (in mean value) $0$ since $Z$ and $(y_i - (\alpha + \beta x_i))$ are nearly perfectly decorrelated, right ? – reuns Apr 18 '16 at 07:30
  • hence when you are minimizing $\min_{\alpha, \beta} \sum_{i=1}^n (y_i + Z- (\alpha + \beta x_i))^2$ it is virtually equivalent to minimizing $\min_{\alpha, \beta} \sum_{i=1}^n (y_i - (\alpha + \beta x_i))^2$, and you get the same $\alpha,\beta$ with or without $Z$ (whenever $Z$ is decorrelated of $y_i - (\alpha + \beta x_i)$ and zero mean !) – reuns Apr 18 '16 at 07:31
  • Why don't u make that an answer. Makes tons of sense – Stan Shunpike Apr 18 '16 at 07:33
  • make yourself the answer :) – reuns Apr 18 '16 at 07:35
  • Why does now $Z$ and $(y_i - (\alpha + \beta x_i))$ being perfectly decorrelated mean that term is zero? – Stan Shunpike Apr 18 '16 at 07:56

1 Answers1

0

During typical linear regression, our goal is to chose $\alpha,\beta$ to minimize the objective function $$J(\alpha,\beta) = \sum_{i=1}^n (y_i - (\alpha + \beta x_i))^2$$ (the ordinary least square). This corresponds to the equation $y = \beta x + \alpha$ which will best fit the data set.

An affine transformation of $Z$, means we should replace $y_i$ by $y_i+Z$. Plugging this into the above function and expanding yields $$(y_i + Z- (\alpha + \beta x_i))^2 = (y_i - (\alpha + \beta x_i))^2 + Z^2 + 2 Z(y_i - (\alpha + \beta x_i))$$

Now $Z^2$ doesn't depend on $\alpha$ and $\beta$ and hence will fall away when minimizing. And $2 Z(y_i - (\alpha + \beta x_i))$ is (assuming mean value) $0$ since $Z$ and $(y_i - (\alpha + \beta x_i))$ are uncorrelated. Hence when you are minimizing $$\min_{\alpha, \beta} \sum_{i=1}^n (y_i + Z- (\alpha + \beta x_i))^2$$ it is equivalent to $$\min_{\alpha, \beta} \sum_{i=1}^n (y_i - (\alpha + \beta x_i))^2$$ and therefore $\alpha,\beta$ are chosen the same with or without $Z$ (assuming whenever $Z$ is uncorrelated with $y_i - (\alpha + \beta x_i)$ and zero mean)