3

The Zipfian distribution serves as a good model for several interesting things. For example, the rate of occurrence of words in the English language (or most any language) appear to follow a Zipfian distribution.

Let's say I have a Zipfian distribution with

$$\textrm{pmf}(k) = \frac{1/k^s}{\sum_{n=1}^{\infty}1/n^s}$$

If I take a sample of size $N$ from this Zipfian distribution, then how many distinct symbols will this sample have? This of course is probabilistic itself, so the real question then is:

What is the distribution (in terms of a pmf) of the number of distinct symbols in a sample of cardinality $N$ which has been drawn from a set $Z \sim \textrm{Zipf}(s)$?

  • Rather than "$N$ samples" you should say "a sample of size $N$" or "a sample consisting of $N$ realizations" or something like that. The individual observations are not "samples"; they are observations in a sample. $\qquad$ – Michael Hardy Jul 27 '16 at 01:31
  • Thanks @MichaelHardy updated. Feel free to change it yourself if you see other improvements like that. – John Berryman Jul 27 '16 at 02:21

0 Answers0