Suppose that you have $n$ observations, $y_1, y_2, ...y_n$ that are the result of the process $aX$ where $a$ is a real constant and $X$ is a random variable such that $X\sim\mathrm{Poisson}(-b/a)$. It is the case that $0>a>1$ and $b>0$. I am trying to use maximum likelihood to estimate $a$ and $b$. I can do it for $b$, and have made some progress for $a$, but need help.
To illustrate, here's a density plot for two simulated data sets. The blue one is $b=-\ln 0.9, a=0.1$ and the pink one is $b=-\ln 0.9, a=0.001$.
The approach, suggested by spaceisdarkgreen, is that we set $x_i = y_i/a$ at the beginning, so that the log-likelihood function is:
$$\ell(\theta) = \ln \big(\frac{(b/a)^{\sum y_i / a} e^{-nb/a}}{\prod (y_i / a)!} \big)$$
$$\ell(\theta) = \sum{(y_i / a)}\ln{b/a} - nb/a - \sum{\ln [(y_i / a)!]}$$
Take the partial derivative w.r.t $b$:
$$ \frac{\partial \ell}{\partial b} = \frac{\sum y_i}{ab} - n/a$$
Setting to zero and multiplying through by $a$ we get
$$ \hat b = \frac{\sum y_i}{n} $$
I can veryify by simulation that this is a good estimator for $b$.
With that in the bag, we try to do the same for a:
$$ \frac{\partial \ell}{\partial a} = -\frac{\sum{y_i}\log{b/a}}{a^2} - \frac{\sum y_i}{a^2} + \frac{nb}{a^2} - \sum \frac{d}{ds}\log [(y_i / a)!]$$
At this point I replaced the factorial with the gamma function. I learned that the logarithmic derivative of the gamma function has its own name, the digamma function, denoted by $\psi()$, so the last term becomes $ -\frac{y_i}{a^2} \psi(y_i/a) $
Multiplying through by $a^2$, setting $b = \sum y_i / n$ which we already know, cancelling terms and distributing the log we get:
$$ \frac{\partial l}{\partial a} = -\sum{y_i}\log{\sum y_i} + \sum{y_i}\log{na} + \sum y_i \psi(y_i/a)$$
Set it equal to zero and we have
$$ \sum{y_i}\log{\sum y_i} =\sum{y_i}\log{na} + \sum y_i \psi(y_i/a)$$
I can check that this equality holds, which is does for relatively small values of $a$ and relatively large values of $b$:
n = 10000; a = 0.02; b = 0.1
x = rpois(n, b/a)
y = a*x
# Left side
sum(y)*log(sum(y)) # Evaluates to 6958.385 on my test run
# Right side
# There are a few NaNs in the digamma(y/s)...these get multiplied by 0
# anyway so it actually doesn't matter what they get set to
dig = digamma(y/a)
dig[is.na(dig)] <- 0
sum(y) * log(n*a) + sum(y*dig) # Evaluates to 6958.799 on my test run
That's very good; but if you flip the values of $a$ and $b$ this equation is much further from holding.
My question (which has by now undergone extensive editing!) is how to proceed from here. Does what I've done look justifiable/right? Is there a way to use the last equation to solve analytically or numerically for $a$? Why does it only seem to work well for a range of values? Thanks!
