13

It is known that many people dump their pet goldfish into lakes, when they want to get rid of them with no remorse! We wish to calculate the number of goldfish in a small lake, in which there are several other fish of various species. For this purpose we pick 20 goldfish and put them a permanent mark and then release them into the lake. After one day (and assuming there are no changes in the population of goldfish or other fish – the system is “closed”), we pick 30 goldfish, of which 5 are found marked. What is the probability that the total population of goldfish in the lake is from 115 to 125?

I have found that this is done by using the “mark and recapture” method, by which we calculate the expected population to be

$\frac{20*30}{5} = 120$

But how do we calculate the probability for it to be in the requested range?

Of course by intuition, I guess it must be close to 100%!

  • 1
    I have an idea: This can be seen as estimating the parameter $p$ in a repeated Bernoulli-distributed trial! The proportion of fish that are marked is equal to $p$ (it's actually equal to $\frac{\text{20}}{\text{something}}$), we do 30 trials and we have 5 successes. So now it's a matter of estimating $p$. – Matti P. Aug 06 '20 at 10:48
  • If you take $X$ to be the number of marked fish that were recaptured, then an estimator of total population is $\hat N=\frac{20\times 30}{X}$ where $X$ can be assumed to have a hypergeometric distribution (for sampling without replacement). Observed value of $\hat N$ is indeed 120 which is an estimate of total population. – StubbornAtom Aug 06 '20 at 12:01
  • Lincoln–Petersen method. Well done, but don't know about the probability. – Pradeep Suny Aug 06 '20 at 13:39
  • L-P method is problematic. Makes no sense to speak of unbiasedness bc/ L-P provides no probability dist'n. It's possible to get $k=0$ unmarked fish upon resample. (That's one reason for Chapman method.) To put a proper dist'n on $N$ as a rand var instead of a parameter to be estimated, we need a Bayesian framework, alluded to but not developed in Wikipedia article. Lacking that, @StubbornAtom mentions the only reasonable ans: $k=5$ is only $k$ with $115 \le \hat N \le 125,$ so we must have $k=5.$ Your intuition $\approx 100%$ looks good. // In R, phyper(5, 20,100, 30) returns 0.6222. – BruceET Aug 09 '20 at 05:19

1 Answers1

1

Your question

What is the probability that the total population of goldfish in the lake is from $115$ to $125$?

is only meaningful from a Bayesian perspective.

Your experiment tells you that the population is at least $45$ as you know there are $20$ marked fish and at least the $30-5=25$ unmarked fish you found.

To use a Bayesian calculation, you need a prior distribution for the population, and this will affect your calculated posterior probability based on the observation. For example using R

  • with an improper prior probability that is constant (i.e. just looking at sums of likelihoods) you could try
    sum(dhyper(5, 20, (115:125) - 20, 30) * 1) / 
    sum(dhyper(5, 20, (45:10^6) - 20, 30) * 1)
    # 0.08099914
  • with an proper prior probability of the population being $N$ is proportional to $\frac{1}{N^2}$, you could try
    sum(dhyper(5, 20, (115:125) - 20, 30) * 1/(115:125)^2) / 
    sum(dhyper(5, 20, (45:10^6) - 20, 30) * 1/(45:10^6)^2)
    # 0.1072485

which suggests that your guess that the probability the population is in that range "must be close to $100\%$" is far too high. Even if you were absolutely sure there were in fact say between $100$ and $150$ fish in the pond, so replacing 45 by 100 and replacing 10^6 by 150, the calculated probabilities for the range would still be relatively low.

Henry
  • 157,058