0

I am trying to improve my implementation of the maximum likelihood (ML) estimator for the multiple squared correlation (https://onlinelibrary.wiley.com/doi/10.1111/j.1467-842X.1985.tb00559.x).

The ML estimator takes as input the normal $R^2$ value, the sample size $N$, and the number of predictors $p$. It returns an estimate for the multiple squared correlation ($\hat{\rho}^2$) by finding $\rho^2$ that maximizes the likelihood of the normal $R^2$ value. In (https://onlinelibrary.wiley.com/doi/10.1111/j.1467-842X.1985.tb00559.x), it is shown that instead of the likelihood, one can maximize the effective likelihood

$$L\left(p,N,R^2;\rho^2\right)=\left(1-\rho^2\right)^{N/2}{}_2F_1\left(\frac{1}{2}N,\frac{1}{2}N,\frac{1}{2}p;\rho^2R^2\right)$$

where ${}_2F_1$ denotes the Gaussian hypergeometric function. The ML estimate is thus $argmax_{\rho^2}L(p,N,R^2;\rho^2)$, with $p$,$N$,$R^2$, all being fixed values.

My current R implementation is quite rudimentary and looks like this

effectiveLH <- function(Rsquared, N, p, rhoSquared) {
  -1 * (1 - rhoSquared)^(N / 2) * gsl::hyperg_2F1(0.5 * N, 0.5 * N, 0.5 * p, rhoSquared * Rsquared)
}

mlEstimator <- function(Rsquared, N, p, llfunction = effectiveLH) { if (Rsquared <= p / N) { return(0) } else { likelihoodFunction <- purrr::partial(llfunction, Rsquared = Rsquared, N = N, p = p) res <- stats::optim(Rsquared, likelihoodFunction, method = "Brent", lower = 0, upper = 1) return(res$par) } }

This works well in many cases. However, it fails if both $R^2$ and $N$ are big. For example, for $N=500, p=2, R^2=.9$, it returns $0.64$, which is wrong, since according to the Corollary on page 177 in (https://onlinelibrary.wiley.com/doi/10.1111/j.1467-842X.1985.tb00559.x) $\hat{\rho}^2 \in (0.89960,0.89963)$. The reason for this is that the output for gsl::hyperg_2F1 becomes very big as rhoSquared becomes bigger in this case, which leads to an overflow and thus gsl::hyperg_2F1 returning Inf. I am looking for a solution for this. Possible ideas:

  1. Find a transformation of the effective likelihood that avoids overflow.
  2. Instead of finding the maximum of the effective likelihood numerically, try to work out an analytical solution.
  3. Increase the precision of the floating-point operations.
  4. Switch the library for the calculation of the hypergeometric function.
  5. Simply run the numerical optimization with the relatively tight bounds from the Corollary. I would prefer not to fall back to this solution, but it might be good enough in practice.

Regarding 1, I tried Euler's transformation, but this does not seem to work. Regarding 2, I realized that the derivative of the hypergeometric function is a very similar hypergeometric function. Thus, the same problem should occur. 3. seems impossible without changing the gsl library, which is beyond my capabilities. For 4., I found the hypergeo package in R, which leads to similar problems. However, Mathematica does not seem to suffer from the overflow problems. I guess because it does computations symbolically. But Mathematica is, of course, not free and thus does not mix too well with R.

Julian Karch
  • 151
  • 4

1 Answers1

1

Hope this could help:

Consider the gaussian hypergeometric function $F(a,b;c;z)$.

If one of the numeratorial parameters, say $b$, is large, then

$$ F(a,b;c;z) \approx M(a;c;bz)$$

where $M(a,b;z)$ is the Kummer function also called confluent hypergeometric function or ${}_{1}F_{1}(a,b;z)$.

Conversely, if the denominatorial parameter is large, we have an approximation as a Tricomi function:

$$F(a,b;c;z) = \left(\frac{c}{1-z}\right)^{a}U\left(a;1+a-b;\frac{c}{1-x} \right)\quad c>>0$$

whis is defined by

$$U(a;c;z) = \frac{\Gamma(1-c)}{\Gamma(1+a-c)}M(a;c;x) + \frac{\Gamma(c-1)}{\Gamma(a)z^{c-1}}M(1+a-c;2-c,x)$$

Bertrand87
  • 2,171
  • This paper may also help: https://link.springer.com/article/10.1007/s11075-016-0173-0 – Bertrand87 Feb 15 '22 at 16:04
  • Thanks a lot for your input! The Krummer function approximation seems not accurate enough for my usecase. ${}_1F_2(250,250;2;0.6)\approx 10^{318}$ but ${}_1F_1(250;2;150)\approx 10^{200}$. The denominatorial parameter $c$ is not necessarily large when the calculation overflows. To my knowledge, the techniques from the paper are already implemented in the gsl package. – Julian Karch Feb 17 '22 at 13:09