I'm trying to find the probability, that in a group of $N$ people, there are no people from at least one district with populations $n_{i}$ (for $i \in \mathbb{N}$ ranging from $0$ to $k$, where $k+1$ is the number of districts) for every district respectively. This boils down to questioning the probability of at least one outcome not occuring in $m$ trials in multinomial distribution. How could I approach this problem?
-
What is $m$? It's the same as $N$? – leonbloy Jan 01 '24 at 20:23
-
1It seems that your first formulation of the problem is without replacement (presumably the $N$ people are distinct?), whereas what you say it "boils down to" corresponds to the problem with replacement. Which one do you mean? – joriki Jan 01 '24 at 20:27
-
If I understand the problem right (the problem is withour replacement) this is not a multinomial but a multivariate Hypergeometric distribution https://en.wikipedia.org/wiki/Hypergeometric_distribution#Multivariate_hypergeometric_distribution – leonbloy Jan 01 '24 at 20:28
-
@leonbloy No I think that this is a multinomial, since if each trial takes $k$ people then all trials should add up to $N$? – Robert Murray Jan 01 '24 at 20:33
-
@RobertMurray Yes, but if then the subsequent trials have different probability https://math.stackexchange.com/questions/2354513/is-there-a-known-distribution-for-multinomial-without-replacement – leonbloy Jan 01 '24 at 20:58
-
@leonbloy I'm not sure, the problem doesn't specify how the selection process occurs. If we imagine each person choosing their own district, then each person is a multinoulli distribution and the problem as a whole is a multinomial. But, if we select each district in sequential order, it would be a multivariate hypergeometric distribution. by the phrasing " to questioning the probability of at least one outcome not occuring in m trials" It seems to be a multinomial without any extra description, even though this is the authors heuristic. – Robert Murray Jan 01 '24 at 21:10
1 Answers
The problem seems conceptually simple, if computationally a bit exhausting, assuming that every selection retains the same probability (ie we are indeed looking at independent draws from the population). We simply need to sum up the probability of every outcome where at least one $n_i=0$. It's a daunting task, so let's look at simpler versions of the problem and work up from there.
The simplest problem would be where $k$ $n_i$ values are zero - in effect, the probability that everyone is from a single district. That's very easy to calculate as below:
$$\sum_{a=0}^{k}\left (\frac{n_a}{\sum_i{n_i}}\right )^N$$
The next simplest problem is where $k-1$ or more values are zero - so, the probability that everyone is from two districts. This is functionally equivalent to a sum of binomial probabilities, with some extra iterators to allow us to go through every option:
$$\sum_{x=0}^{k}\sum_{y=x+1}^{k}\sum_{a=0}^{N}\binom{N}{a}\left (\frac{n_x}{\sum_i{n_i}}\right )^a\left (\frac{n_y}{\sum_i{n_i}}\right )^{N-a}$$
It's important to note that this would, in fact, include the previous equation - it should iterate through all probabilities of all unique combinations.
The next simplest is where $k-2$ or more values are zero - so, that everyone is from 3 districts. This is functionally equivalent to a sum of 3-multinomial probabilities, again with extra iterators to manage the additional degrees of freedom:
$$\sum_{x=0}^{k}\sum_{y=x+1}^{k}\sum_{z=y+1}^{k}\sum_{a=0}^{N}\sum_{b=0}^{N-a}\binom{N}{a,b,\left (N-a-b\right )}\left (\frac{n_x}{\sum_i{n_i}}\right )^a\left (\frac{n_y}{\sum_i{n_i}}\right )^{b}\left (\frac{n_z}{\sum_i{n_i}}\right )^{N-a-b}$$
The pattern, I believe, should be getting pretty obvious. Your initial problem is equivalent to a sum of $k$-multinomial probabilities. Iterating through all of those probabilities would take $2k-1$ iterators - $k$ iterators for iterating through all combinations of $n_i$ values, and $k-1$ iterators for iterating through the multinomial. Unsurprisingly, this approach doesn't scale well for large values of $k$!