1

Given two independent, standard-normally distributed random variables $x,y\sim \mathcal{N}(0,1).$ I would like to do an univariate linear regression without intercept $Y = X \cdot \beta + \epsilon.$ R gives me as estimate $\beta = 0$

n <- 10000 x <- rnorm(n) y <- rnorm(n) plot(x,y)enter image description here fit <- lm(y ~ 0 + x) summary(fit)

but I feel the problem is not well-defined and any $\beta \in \mathbb{R}$ appears to minimize the expected least square error if you consider a rotation of the coordinate system. Any thoughts on why $\hat{y} = 0$ minimizes the least squares criterion and not $\hat{y} = \hat{\beta} \cdot x, \hat{\beta} \in \mathbb{R}$?

PT272
  • 309
  • Are you asking to fit the same model $Y=\beta \cdot X + \epsilon$ using a different minimization criterion? From the picture, $\beta = 0$ seems plausible, because of the rough symmetry about the origin. – hardmath Jul 15 '16 at 16:26
  • No, I am not interested in Laplacian, Huber, etc. criterion. Cost function is just standard expected least square criterion which should be minimized. – PT272 Jul 15 '16 at 16:50
  • The data are rotation invariant but the cost function (and therefore the solution) is not. – zyx Jul 15 '16 at 17:31
  • I edited the question already to clarify it... – PT272 Jul 19 '16 at 09:51

2 Answers2

2

As @hardmath mentioned in the comment, the results are perfectly logical. If $Y$ and $X$ are independent and each one is $\mathcal{N}(0,1)$, so clearly (from independence) $cov(X,Y)=0$ and the real intercept is $0$, because $(0,0)=(\mathbb{E}X, \mathbb{E}Y)$. Hence, the real regression line is simply $y=0+0 x+\epsilon=\epsilon$, where $\epsilon \sim \mathcal{N}(0,1)$ which coincides with the OLS results.

V. Vancak
  • 16,444
1

If the regression data are symmetric with respect to changing the sign of $y$, the least-squares approximation is the line $y=0$. The error is a sum of pairs $((y+a)^2 + (y-a)^2)$ all of which are minimized at $y=0$.

If the data are samples from a symmetric distribution then $y=0$ is the expected regression line and the actual line will be a small random perturbation of that.

zyx
  • 35,436