0

When I have a table of values like \begin{array}{c|ccccc} x & 1 & 2 & 3 & 4 & 5 \\ y & 3 & 6 & 8 & 9 & 0 \\ y & 4 & 6 & 1 & 2 & 4 \end{array}

and know that it is a simple linear regression model, what is the value of $n$? I think it is either $5$ or $10$ but am not sure which one. I need the value to calculate the least square estimates. Please explain.

Lucy
  • 1
  • Could you explain the difference between the two $y$'s you posted above? in general we have one dependent variable $y$ and possibly multiple independent variable labeled $x_i$. Or sometimes we denote $\hat{y}$ as estimates of $y$. – JimmyJackson Dec 20 '13 at 05:40
  • 1
    What do you mean, $n$? I don't see an $n$ anywhere. – Newb Dec 20 '13 at 05:41
  • 1
    I would assume $n$ is 5, the number of observations of $x$ and $y$, but I am certain that one of the $y$'s above is a $\hat{y}$ – JimmyJackson Dec 20 '13 at 05:43
  • @JimmyJackson the question is: An experiment is conducted to devises a scale to measure the freshness of a plant after varying periods of time. y is the freshness measurement and x is the time period in days ... I don't think there's a y hat in there! – Lucy Dec 20 '13 at 05:55
  • @Newb, I am interested in finding Beta 1 which is equal to nsum(xiyi)-sum(xi)sum(yi)]/[n*sum(xi^2)-(sum of xi)^2]. – Lucy Dec 20 '13 at 06:00

2 Answers2

0

$n$ should be 5. The question might be asking you to calculate a simple linear regression with one of set of values for $y$ and $x$, and then calculate it again using the other set of values of $y$ and the same values for $x$ again. So, at first focus on the 1st and 2nd rows only

\begin{array}{c|ccccc} x & 1 & 2 & 3 & 4 & 5 \\ y & 3 & 6 & 8 & 9 & 0 \\ \end{array}

then you will have that

$$\hat{\beta_1 }= \frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n(x_i-\bar{x})^2}$$

and $$\hat{\beta_0}=\bar{y}-\hat{\beta_1}\bar{x}$$

Now since $n$ equals 5 we have according to the table we have

$ \bar{x} =\frac{1}{5} \sum_{i=1}^{5} x_i=\frac{1}{5}(1+2+3+4+5)$=3

and

$ \bar{y} =\frac{1}{5} \sum_{i=1}^{5} y_i=\frac{1}{5}(3+6+8+9+0)=5.2$

so,

\begin{align*} \sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y}) &= \sum_{i=1}^5(x_i-3)(y_i-5.2)\\ &= (1-3)(3-5.2)+(2-3)(6-5.2)+(3-3)(8-5.2)\\ & \quad +(4-3)(9-5.2)+(5-3)(0-5.2) \\ &= -3 \end{align*}

and

\begin{align*} \sum_{i=1}^n(x_i-\bar{x})^2 &= \sum_{i=1}^5(x_i-3)^2 \\ &= (1-3)^2+(2-3)^2 +(3-3)^2+(4-3)^2+(5-3)^2 \\ &= 10 \end{align*}

so,

$$\hat{\beta_1}= \frac{-3}{10}=-0.3$$

and

\begin{align*} \hat{\beta_0} &=\bar{y}-\hat{\beta_1}\bar{x}\\ &= 5.2-(-0.3)\cdot 3 \\ &= 6.1 \end{align*}

Hence, the expected value of $y_i$ given $x_i$, denoted as $\hat{y_i}$ is given by equation:

$$\hat{y_i}=6.1-0.3x_i+e_i$$

where $e_i$ is you error term. Now you can do this same process using the other set of value for $y$. Notice that in a simple linear regression modal you will never have more that one independent variable and dependent variable, and the number of observations must match. That is our data is a set of ordered pairs $$\{(y_i,x_i)|i=1,...,n\}$$

which makes it clear that we need to have as many $y$ values as $x$ values. In practice it is often that you can find more data for one variable than another variable, in which case we only consider values of each variable that have matches. Does this make sense?

JimmyJackson
  • 1,517
0

$\newcommand{\+}{^{\dagger}}% \newcommand{\angles}[1]{\left\langle #1 \right\rangle}% \newcommand{\braces}[1]{\left\lbrace #1 \right\rbrace}% \newcommand{\bracks}[1]{\left\lbrack #1 \right\rbrack}% \newcommand{\ceil}[1]{\,\left\lceil #1 \right\rceil\,}% \newcommand{\dd}{{\rm d}}% \newcommand{\ds}[1]{\displaystyle{#1}}% \newcommand{\equalby}[1]{{#1 \atop {= \atop \vphantom{\huge A}}}}% \newcommand{\expo}[1]{\,{\rm e}^{#1}\,}% \newcommand{\fermi}{\,{\rm f}}% \newcommand{\floor}[1]{\,\left\lfloor #1 \right\rfloor\,}% \newcommand{\half}{{1 \over 2}}% \newcommand{\ic}{{\rm i}}% \newcommand{\iff}{\Longleftrightarrow} \newcommand{\imp}{\Longrightarrow}% \newcommand{\isdiv}{\,\left.\right\vert\,}% \newcommand{\ket}[1]{\left\vert #1\right\rangle}% \newcommand{\ol}[1]{\overline{#1}}% \newcommand{\pars}[1]{\left( #1 \right)}% \newcommand{\partiald}[3][]{\frac{\partial^{#1} #2}{\partial #3^{#1}}} \newcommand{\pp}{{\cal P}}% \newcommand{\root}[2][]{\,\sqrt[#1]{\,#2\,}\,}% \newcommand{\sech}{\,{\rm sech}}% \newcommand{\sgn}{\,{\rm sgn}}% \newcommand{\totald}[3][]{\frac{{\rm d}^{#1} #2}{{\rm d} #3^{#1}}} \newcommand{\ul}[1]{\underline{#1}}% \newcommand{\verts}[1]{\left\vert\, #1 \,\right\vert}$ $\ds{{\cal F}\pars{a,b} \equiv \half\sum_{i = 1}^{5}\sum_{\sigma = \pm}\pars{ax_{i} + b - y_{i\sigma}}^{2}}$

\begin{align} 0 &= \partiald{{\cal F}\pars{a,b}}{a} = \sum_{i = 1}^{5}\sum_{\sigma = \pm}\pars{ax_{i} + b - y_{i\sigma}}x_{i} = \sum_{i = 1}^{5}\pars{2x_{i}^{2}a + 2x_{i}b - x_{i}\sum_{\sigma = \pm}y_{i\sigma}} \\[3mm] 0 &= \partiald{{\cal F}\pars{a,b}}{b} = \sum_{i = 1}^{5}\sum_{\sigma = \pm}\pars{ax_{i} + b - y_{i\sigma}} = \sum_{i = 1}^{5}\pars{2x_{i}a + 2b - \sum_{\sigma = \pm}y_{i\sigma}} \end{align}

$$ \left\lbrace% \begin{array}{rcrcl} \overbrace{\pars{2\sum_{i = 1}^{5}xi^{2}}}^{\ds{\equiv\ S_{xx}}}\ a & + & \overbrace{\pars{2\sum_{i = 1}^{5}xi}}^{\ds{\equiv\ S_{x} }}\ b & = & \overbrace{\sum_{i = 1}^{5}x_{i}\sum_{\sigma = \pm}y_{i\sigma}} ^{\ds{\equiv S_{xy}}} \\[3mm] \underbrace{\pars{2\sum_{i = 1}^{5}x_{i}}}_{\ds{=\ S_{x}}}\ a & + & 10\,b & = & \underbrace{\sum_{i = 1}^{5}\sum_{\sigma = \pm}y_{i\sigma}}_{S_{y}} \end{array}\right. $$

$$ S_{xx} = 110\,,\quad S_{x} = 30\,,\quad S_{xy} = 122\,,\quad S_{y} = 43 $$

$$\left.% \begin{array}{rcrcl} 55 a & + & 15 b & = & 61 \\[1mm] 30 a & + & 10 & b = & 43 \end{array}\right\rbrace \quad\imp\quad \left\lbrace% \begin{array}{rclcr} a & = & {61\times 10 - 43\times 15 \over 100} & = & -\,{7 \over 20} \\[3mm] b & = & {51\times 43 - 30\times 61 \over 100} & = & {363 \over 100} \end{array}\right. $$ which yields $$ y\pars{x} = {1 \over 100}\pars{-35x + 363}\,,\qquad\imp\qquad \begin{array}{c|ccl} x & y&& \\[2mm] 1 & {82 \over 25} & = & 3.28 \\[2mm] 2 & {293 \over 100} & = & 2.93 \\[2mm] 3 & {129 \over 50} & = & 2.58 \\[2mm] 4 & {223 \over 100} & = & 2.23 \\[2mm] 5 & {47 \over 25} & = & 1.88 \end{array} $$

Felix Marin
  • 89,464