Questions tagged [regression]

This tag is for questions on (linear or nonlinear) regression, which is a way of describing how one variable, the outcome, is numerically related to predictor variables. The dependent variable is also referred to as $~Y~$, dependent or response and is plotted on the vertical axis (ordinate) of a graph.

Regression is a statistical measurement used in finance, investing and other disciplines that attempts to determine the strength of the relationship between one dependent variable (usually denoted by $~Y~$) and a series of other changing variables (known as independent variables).

Types of Regression –

  • Linear regression
  • Logistic regression
  • Polynomial regression
  • Stepwise regression
  • Stepwise regression
  • Ridge regression
  • Lasso regression
  • ElasticNet regression

The two basic types of regression are linear regression and multiple linear regression.

The general form of each type of regression is:

  • Linear regression: $~Y = a + b~X + u~$
  • Multiple regression: $~Y = a + b_1~X_1 + b_2~X_2 + b_3~X_3 + ... + b_t~X_t + u~$

Where:

  • $Y =~$ the variable that you are trying to predict (dependent variable).
  • $X =~$ the variable that you are using to predict Y (independent variable).
  • $a =~$ the intercept.
  • $b =~$ the slope.
  • $u =~$ the regression residual.

There are multiple benefits of using regression analysis. They are as follows:

$1.~$ It indicates the significant relationships between dependent variable and independent variable.

$2.~$ It indicates the strength of impact of multiple independent variables on a dependent variable.

Reference:

https://en.wikipedia.org/wiki/Regression_analysis

This tag often goes along with the tag.

2700 questions
1
vote
2 answers

Polynomial regression - correctness and accuracy

I have just finished a code that performs polynomial regression, doing $(X'X)^{-1}X'y$ (where $X'$ is the transpose) to estimate the vector of coefficients. Now I'd like to add some check procedures to assert that everything is correct and that the…
CTZStef
  • 144
1
vote
0 answers

Linear Regression with limited information

You have grades ($Y $) for men ($D = 0$) and women ($D = 1$). The mean grades (out of total possible score of 100) are 65 for men and 72 for women. Regression of $Y$ on $D$ yields: $Y_i = b_0 + b_1D_i + e_i $. (a) What values would you get for $b_0$…
delerion
  • 11
  • 1
1
vote
1 answer

Implicit Curve Fitting

I have 100 points scattered in the 3D space along the $z$ coordinate axis. The points appear to lie on a curve. Is it possible to find an (implicit) curve that fit these points and option to insert coordinate $z$ and get the pair $(x,y)$ from the…
home99
  • 11
1
vote
1 answer

Loss function for regression - vanishing cross term

[From PRML Bishop, p:48] I do not understand how the cross term vanishes in the integration. I have tried writing this out, but it does not really make sense to me. The same operation happens in "Gaussian Processes for Machine Learning" by…
fynsta
  • 132
  • 6
1
vote
0 answers

How to approximate a pseudo-periodic curve?

My data describes a pseudo-periodical behaviour: the amplitude seems stable, but the frequence decreases with X, making the pseudo-period increasing exponentially. What I mean by that is that I measured that for any X I could expect a full…
1
vote
2 answers

how to interpret this interaction in this regression?

(this is simulated data without error variance) So, I have model: y = x + gender(categorical variable, effect coded) + interaction(x and gender) x and gender and interaction were all significant and the plot y and x and gender. [3 But if I see the…
yoo
  • 133
1
vote
1 answer

How does LASSO select the heavy collinear features (randomly?)

We know that for a group of heavy collinear features, LASSO will mostly to select one of them and set the others zero. And many references said that this selected feature is randomly determined by LASSO. I am not sure for this conclusion and don't…
1
vote
1 answer

Variation from on which variable is treated as a dependent variable

I need assistance to figure out if the following statement is true: The proportion of variation in the dependent variable explained by fitting the simple linear regression model does not depend on which variable is treated as the…
joseph
  • 135
1
vote
1 answer

Linear fit with horizontal and vertical error bars

I'm searching an equation to calculate the parameters for a linear fit. With parameters a and b, the $\chi ^{2}$ is used: $\chi ^{2} = \sum_{i=0}^N (y_{i}-a.x_{i}-b)^{2}$ And with errors: $\chi ^{2} = \sum_{i=0}^N…
1
vote
0 answers

regression coefficient

Consider observations on three variables X1;X2 and X3: Suppose that X1 is regressed on X2: When the residual of the above regression is regressed on X3; the regression coefficient of X3 is b3: When X1 is regressed on X2 and X3 simultaneously, the…
kris91
  • 401
1
vote
1 answer

Initializing Variables using Shrinkage

I have a user-user model which which users can rate their friendships(r) with others and also can have activities with them(a). I am using Matrix Factorization and Gradient Descent for updating the vectors. Now, I am randomly initialize my 2…
AliBZ
  • 111
1
vote
0 answers

About the weights assigned in the linear regression

I have this confusion related to linear regression. Lets say I have two predictors $x_1$ and $x_2$ and the target is $y$. I learn a linear regression with $y \sim x_1,x_1 \cdot x_2,x_2$ with $x_1 \cdot x_2$ being the interaction term. Lets suppose I…
user34790
  • 4,192
1
vote
1 answer

RSS and b1 are independently distributed

Consider the simple linear regression model$$y=\beta_0+\beta_1X+e,$$ We observe a sample of n sets of observations $(x_i,y_i)(i=1,2,\cdots,n)$, then we can write $$y_i=\beta_0+\beta_1x_i+e_i,$$ where $$e_i \sim N(0,\sigma^2),i.i.d$$ Using the least…
1
vote
1 answer

Proof that a and b in linear regression are random variables

Does anyone know how to prove that the variables $a$ and $b$ that are used in linear regression are random variables? For me the assumption would be that these are dependent on the values of $x$ and $y$ which are simply random variables that can…
1
vote
0 answers

Correlation coefficient.

A linear regression gives us a correlation coefficient $r=0$. What is the equation of the best fit line? Give an example of data with $r=0$ What is the value of the correlation coefficient of data on a line parallel to the x-axis or y-axis? I…