1

I'm looking for an equation that will help me calculate probabilities for a project at work. I've been Googling and reading posts here trying to figure out what I need but my unskilled brain is running into roadblocks. I hope that you can point me in the right direction.

In simplest terms, we have a promotion running that involves users finding 7 different virtual items evenly distributed among a field of essentially unlimited locations. In order to determine if the promotion is encouraging participation (i.e. visiting locations), we want to see if the number of people obtaining the complete set of 7 items is greater or lesser than what would be expected by chance given x number of location visits. I hope that makes sense.

I am looking for an equation that I can plug into Excel or SQL that returns this probability. I have found questions similar to this one here, but I can't seem to get over the hump of converting things into the usable equation I need (mostly due to my limited knowledge of statistics and notation).

Thank you in advance for your help,

Jon

  • Hey! I think we need some more information. Do you know, say, the probability of finding exactly one item in one visit? – jlammy Jan 17 '21 at 02:38
  • Sorry about that! The virtual items are distributed evenly and every location contains exactly one. Taken individually, there's a 1-in-7 chance of randomly encountering any one given item. – Jon Stanley Jan 18 '21 at 19:01

1 Answers1

0

As I take your question, you have a large number of locations (say $N$), of which $7$ of these contain a particular prize. You want to determine the probability that if a user randomly selects $x$ of these locations, then they will receive all $7$ prizes.

There is a special notation in mathematics called the binomial coefficient $\binom{n}{k}$, which gives the total number of ways to choose $k$ objects out of a total $n$. In this case, the total number of locations a user can choose is $$\binom{N}{x}$$ And if we require that $7$ of these choices are the 'prizes', then since the user must visit these locations, the remaining number of choices is $$\binom{N-7}{x-7}$$ and hence the probability is simply $$\frac{\binom{N-7}{x-7}}{\binom{N}{x}}$$ or $0$ if $x<7$.

In Excel, we can find the value $\binom{n}{k}$ with COMBIN(n,k), and hence you would want COMBIN(N,x)/COMBIN(N-7,x-7). Note that for this to work you require $x$ to be between $7$ and $N$, otherwise Excel may return an error.


UPDATE (in order to reflect the correction given in the comments)

Consider calculating the probability that a user receives all the prizes on the $n^\text{th}$ visit to a location, which we call $P(n)$. Since there are 7 possible prizes, the number of 'prize sequences' a user could encounter in $n$ locations is $7\times7\times\cdots\times 7=7^n$.

Now, the hard part is determining how many of these sequences involve the user finding the final prize at the $n^\text{th}$ location. To find this, we choose one of the 7 prizes to be the 'last discovered', and then count the number of weighs a user can find all remaining $6$ prizes in the first $n-1$ locations. Mathematically, this is equivalent to the number of surjections from $[n-1]$ to $[6]$ (don't worry about this if you don't understand it). In any case, the probability will be

$$P(n)=\frac{7\times 6!S(n-1,6)}{7^n}=\frac{6!S(n-1,6)}{7^{n-1}}$$

where $S(a,b)$ refers to the Stirling Numbers of the Second Kind. Since a user must visit at least $7$ locations, the average number of visits for a user will be

$$\mu=\sum_{n=7}^{\infty}nP(n)=\sum_{n=7}^{\infty}n\frac{6!S(n-1,6)}{7^{n-1}}$$

Doing some numerical calculation, I find this average to be $\approx 18.15$. I'm not sure Excel has a native option for finding these Stirling Numbers, but an explicit formula is given on the Wikipedia page.

GossipM
  • 405
  • Thank you for the clear explanation! How do things change if the 7 items are replicated across the field so that they are evenly distributed and every location contains exactly one? (In other words, does the equation change if there are an indefinite number of items present in the indefinite number of locations, but they are evenly distributed so that every location has a 1-in-7 chance of holding a given item?) – Jon Stanley Jan 18 '21 at 19:04
  • Hi @JonStanley. I think I understand your question properly now, and I have updated my answer accordingly. Unfortunately the maths gets a little trickier! – GossipM Jan 19 '21 at 10:20
  • 1
    Thank you so much for your help, @GossipM! – Jon Stanley Jan 19 '21 at 19:22