I want to fully understand the probabilistic interpretation. As in, I know once we have a probabilistic model, we differentiate for maximum likelihood and find the weights/regressors but what i really find difficult to grasp is how exactly are we developing a probabilistic model for linear regression.
I have see that initially we we write: $y_i=\epsilon_i +w^Tx_i$.---------(1)
Here i want to know what is $y_i$? Is it the observed value? Then how come we model it as random? Where is the randomness coming from? What is $\epsilon_i$? Is it error or noise?
please correct if i am wrong:
What i understand is our measured data is noisy. i.e for the same $x_i, y_i$ can vary on a different draw of samples, which is due to some inherent randomness in $y_i$. And this randomness is what we are quantifying using $\epsilon_i \sim N(0,\sigma^2)$. Hence $y_i$ is a normal random variable given $x_i$ and it has mean $w^Tx_i$ so we want to maximize the likelihood, meaning maximize the probability that $y_i$ takes the value which we have in our current experimental data, given xi and this probability happens to be parameterized by w due to (1).