0

In the book "An introduction to multivariate statistical analysis" by T.W. Anderson, I note that the probability of $A=\sum_{i=1}^{N}(x_i-\bar{x})(x_i-\bar{x})^T$ with $\bar{x}=\sum_{i=1}^{N}x_i/N$ to be positive definite is $1$.

How to prove $$Pr(|A|=0)=0,$$ when $N>p$, $p$ is the dimension of random vector $x$. Can anyone give some outlines to me?

1 Answers1

0

I'll assume the $x_i$ are iid and drawn from a continuous nondegenerate distribution on $\mathbb{R}^p$. To make things easier, let's replace each $x_i$ with $y_i = x_i - \overline{x}$. Then $$ A = \sum_{i=1}^N y_i y_i^T. $$ In order for $A$ to be positive definite, it suffices to show that $v^T A v > 0$ for all $v \in \mathbb{R}^p$. This is satisfied if $$ v^T y_i y_i^T v = (v \cdot y_i)^2 > 0$$ for every $i$. By assuming $N > p$, this occurs with probability one (why?).

Titus
  • 2,289
  • Thanks for your answer, I also get the first part. But the second part that how the probablility equals 1 by assuming $N>p$ also puzzled me. – shawn Wong Dec 27 '15 at 13:30
  • We could have made the weaker statement that $(v \cdot y_i)^2 > 0$ for some $i$ (we just need the sum $v^T A v = \sum (v\cdot y_i)^2$ to be positive). Then if some vector $v$ fails this, it must be perpendicular to all the $y_i$. That means all $N$ of the $y_i$ must lie in a $(p-1)$-dimensional subspace. Of course ${y_1, \dots, y_{p-1} }$ will lie in such a subspace, but what is the probability that $y_p$ lies in the same space? Here is where we invoke the nondegeneracy of the distribution. – Titus Dec 28 '15 at 11:44
  • Thanks alot, I think I have understand your idea! – shawn Wong Jan 01 '16 at 08:58