1

Background Information:

Starting with the sample $X_1,\ldots, X_{N}$ and sort the sample so that $X_1\leq X_2\leq \cdots \le X_N$. In our case the data set $x_1 = 0.2$, $x_2 = 0.6$, $x_3 = 0.7$. Suppose $X\sim \mathcal{U}(0,1)$ then the cumulative distribution function for $X$ is $$F(x) = x $$ We have $$D_N = \sum_{-\infty < x < \infty}|F_N(x) - F(x)|$$ where $$F_N(x) = \begin{cases} 0 \ &\text{if } x < X_1\\ k/N \ &\text{if } X_k\leq x < X_{k+1}\\ 1 \ &\text{if } x > X_N \end{cases}$$ The first and last terms are $$\sup_{x < X_1}|-F(x)| = F(X_1)$$ $$\sup_{x > X_N} |1 - F(x)| = 1 - F(X_N)$$ For the other terms, observe that $$\sup_{X_k\leq x < X_{k+1}}\left|\frac{k}{N} - F(x)\right| = \max\left(F(X_{k+1}) - \frac{k}{N},\frac{k}{N} - F(X_k)\right); k = 1, \ldots, N - 1$$

$D^{+}$ and $D^{-}$: $$\begin{aligned}D_{N}&=\max\left[F(X_{1}),\max_{k=1,\ldots,N-1}\left(F(X_{k+1})-\tfrac{k}{N},\tfrac{k}{N}-F(X_{k}),1-F(X_{N})\right)\right]\\ &= \max\left[\underbrace{\max_{k=1,\ldots,N}\left(\tfrac{k}{N}-F(X_{k})\right)}_{D^{+}},\underbrace{\max_{k=1,\ldots,N}\left(F(X_{k})-\tfrac{k-1}{N}\right)}_{D^{-}}\right]\\&=\max\{D^{+},D^{-}\}\end{aligned}$$ The formulas simplify if $X\sim U(0,1)$ since then $F(x)=x$.

Question:

Let $x_1,\ldots x_n$ be real numbers in $(0,1)$. Show that the Kolmogorov-Smirnov statistic of $x_1,\ldots x_n$ with the null hypothesis that the numbers follow the $U(0,1)$ distribution is precisely the star-discrepancy of these numbers. Having this equivalence in mind, compare and contrast the two statements "$\{x_n\}$ is a pseudorandom sequence from the uniform distribution $U(0,1)$ that passes the frequency test" with $"\{x_n\}$ is a u.d. mod $1$ sequence".

I am not really sure how to proceed with this, any suggestions are greatly appreciated.

Wolfy
  • 6,495
  • The sum defining $D_N$ is infinite. Sure about this formula? – Did Mar 19 '17 at 22:56
  • @Did Yes, I simply copied it from my notes, not sure why you think it is infinite though. – Wolfy Mar 20 '17 at 00:33
  • Because this is a sum of uncountably many nonzero terms. If this is the definition, why are you studying maxima later on? – Did Mar 20 '17 at 00:52
  • the sum is from $1,\ldots,N$ so I don't think it is uncountable – Wolfy Mar 20 '17 at 01:26
  • Sorry but I read $$D_N = \sum_{-\infty < x < \infty}\cdots$$ hence the sum is on every real number $x$. And why, later on, do maxima appear out of the blue? Please stop wasting everybody's time and explain once and for all what your question really is. – Did Mar 20 '17 at 07:13
  • @Did I am sorry but I cannot explain why $D_N$ is defined in that way, I have simply copied my professors notes, sorry to waste your time. – Wolfy Mar 20 '17 at 15:08
  • But then why do you write "The first and last terms are..." and the rest? This is also copied blindly from your professor's notes? – Did Mar 20 '17 at 15:46
  • The rest is for numerical purposes I didn't copy blindly, perhaps it would make more sense to see my c++ code of this in action – Wolfy Mar 20 '17 at 15:47
  • "Blindly" refers to the fact that, in effect, you cannot explain what is written in your own question. Sorry but I maintain the term (and please, no c++ code...). – Did Mar 20 '17 at 15:55

1 Answers1

1

Is star discrepancy defined as

$ \Delta_N = \sup_{x \in [0, 1]}\left|\frac{1}{N}\sum_{n=1}^{N} \mathbb{1}_{[0, x]}(x_n)-x\right|$ ?

In this case, note that $\frac{1}{N}|\sum_{n=1}^{N}\mathbb{1}_{[0,x]}(x_n) - x| = \frac{1}{N}|k-Nx| = |\frac{k}{N} - x|$, where $k$ is the number of $x_n<x$. This is equal to $|F_N(x)-F(x)|$ in the notation above.

  • I don't think this is a complete solution of the question posed, if you can add more details I will gladly accept your answer and award the points. – Wolfy Mar 19 '17 at 23:36
  • Assume the star discrepancy is defined as you have. Could you just comment on the last part of the question? – Wolfy Mar 20 '17 at 00:34