1

I'm reading a stats textbook and it covers the idea of leverage (page 99): $$ h_i = \frac{1}{n}+\frac{(x_i-\bar{x})^2}{\sum_{i'=1}^n{(x_i-\bar{x})^2}}. $$ I understand the second term---squared distance of the $i$-th independent variable from the mean over the sum of squared distances---but I don't understand why we add $\frac{1}{n}$ at the beginning. Moreover, the textbook claims that the sum is bounded between $\frac{1}{n}$ and $1$ which implies that the second term is bounded at $1 - \frac{1}{n}$ which I feel should be proved. Can someone please explain why we are adding $\frac{1}{n}$ to calculate leverage and why the second term is bounded above at $1 - \frac{1}{n}$? Thanks.

  • 1
    One of my friends shared this link which explicitly derives the formula from the hat matrix: https://mandymejia.com/explicit-derivation-of-leverage-in-slr/ – FountainTree Jun 15 '22 at 23:05

1 Answers1

1

The $\frac 1n$ comes from the fact that we are including an intercept term and the intercept term in the design matrix $X=\begin{bmatrix}1 & X_1\\ 1 & X_2 \\ \vdots & \vdots \\ 1&X_n\end{bmatrix}$ is all 1's therefore their distance in the first dimension is the same, 1/n. And then from the second dimension we get the added term $\frac{(X_i-\bar X)^2}{\sum _{i=j}^n (X_j-\bar X)^2}$ measuring the distance from $X_i$ to the mean. Things to note is that the total leverage is always $p$ (where p is the number of parameters). Notice in the link in your comment there are two parts, with the second derivation without an intercept defining leverage with only the second part of the leverage formula.

A helpful link for you:

More advanced links:

Vons
  • 11,004