1

Show that a stastistical distance is a real distance.

Let $S=(s_{11},s_{22},\dots,s_{pp})$ a vector of sample variances and three vectors $X=(x_1,x_2,\dots,x_p)$, $Y=(y_1,y_2,\dots,y_p)$ and $W=(w_1,w_2,\dots,w_p)$. The statistical distance is defined by $$D(X,Y)=\sqrt{\frac{(x_1-y_1)^2}{s_{11}}+\frac{(x_2-y_2)^2}{s_{22}}+\dots+\frac{(x_p-y_p)^2}{s_{pp}}}$$ So I want to show that

a) $D(X,Y)=D(Y,X)$

b) $D(X,Y)=0$ if $X=Y$

c) $D(X,Y)>0$ if $X\neq Y$

d) $D(X,Y)\leq D(X,W)+D(W,Y)$

Proof:

a)$$D(X,Y)=\sum_{i=1}^p\frac{(x_i-y_i)^2}{s_{ii}}=\sum_{i=1}^p\frac{(y_i-x_i)^2}{s_{ii}}=D(Y,X)$$

b) If $X=Y$ then $\forall i\qquad x_i=y_i$ $$D(X,Y)=\sum_{i=1}^p\frac{(x_i-y_i)^2}{s_{ii}}=\sum_{i=1}^p\frac{(y_i-y_i)^2}{s_{ii}}=0$$

c) If $X\neq Y$ then there is at least one $x_i\neq y_i$ then for some $i$ $(x_i-y_i)>0$ thus $$D(X,Y)=\sum_{i=1}^p\frac{(x_i-y_i)^2}{s_{ii}}>0$$

d) Here is the problem, I need to show the triangle inequality and proof the cauchy schwarz inequality, the same argument from here Euclidean distance proof hold in this case?

Roland
  • 3,165
  • 1
    The $s_{jj}$ terms are the same between different random variables? – Masacroso Oct 15 '16 at 20:31
  • @Masacroso I'm not sure about that, but I think that not. – Roland Oct 15 '16 at 20:32
  • 1
    @Masacroso : Could you write $|x|$ instead of $||x||$? Notice the conspicuous difference between $|a||b|$ (which is standard notation coded as |a||b|) and $||a|| ||b||$ (which is your notation, coded as ||a|| ||b||). $\qquad$ – Michael Hardy Oct 15 '16 at 20:42

1 Answers1

2

Supposing that the $s_{jj}$ are equal between different random variables, as it seems to be by the wording of the problem, then squaring the formula for triangle inequality we can see that

$$\sum\frac{(x_j-y_j)^2}{s_{jj}}\le \sum\frac{(x_j-z_j)^2}{s_{jj}}+\sum\frac{(z_j-y_j)^2}{s_{jj}}+2\sqrt{\sum\frac{(x_j-z_j)^2}{s_{jj}}\sum\frac{(z_j-y_j)^2}{s_{jj}}}$$

and then observe that

$$2\sqrt{\sum\frac{(x_j-z_j)^2}{s_{jj}}\sum\frac{(z_j-y_j)^2}{s_{jj}}}\ge 2\sum\frac{|z_j-y_j||x_j-y_j|}{s_{jj}}$$

Then comparing term by term

$$\sum\frac{(x_j-y_j)^2}{s_{jj}}\le \sum\frac{(x_j-z_j)^2}{s_{jj}}+\sum\frac{(z_j-y_j)^2}{s_{jj}}+2\sum\frac{|z_j-y_j||x_j-y_j|}{s_{jj}}$$

we can see that

$$(x_j-y_j)^2\le (x_j-z_j)^2+(z_j-y_j)^2+2|z_j-y_j||x_j-z_j|\implies\\ -x_jy_j\le z_j^2-x_jz_j-y_jz_j+|z_j-y_j||x_j-z_j|$$

I left the last statement unfinished. Check if it is true.

Masacroso
  • 30,417