TL;DR
$$y = 659.113 \log\left(\frac{x}{240.399} + 1\right) + \frac{-1213.555 x}{x + 739.845}$$
Let $f: [0, +\infty) \to [0, +\infty)$ be the function that maps the number of users to threshold, i.e. the green curve

$f$ should satisfy these properties:
- $f = \Theta{(\log(x))}$ as $x \to +\infty$ by assumption
- $f$ is infinitely differentiable (we take the derivatives at $x = 0$ to be right-handed derivatives at $0$, and say it is differentiable at $0$ if right-handed derivatives exist at $0$)
- $f(0) = 0$ from the graph
- $f'(0) \approx 1$ since $f$ more or less coincides with the blue line of slope $1$ as $x \to 0^+$
- $f'(x) > 0$ on $[0, +\infty)$ since $f$ is strictly increasing
- $f''(x) < 0$ on $[0, +\infty)$ since $f$ is strictly decelerating
To fit the green curve, we first make some measurements (blue dots). Then, we use R to perform regression analysis (using the nls_multstart function from the nls.multstart package).
Because of the $\Theta{(\log(n))}$ requirement, the obvious choice is to assume $y = a \log (\frac{x}{b} + 1)$ for some unknown $a, b \in \mathbb{R}$. Unfortunately, this gives us a poor fit.
We need to improve our model. To proceed, let's assume there is some correction term such that $y = a \log(\frac{x}{b} + 1) + \mathcal{O}(1)$. A commonly used $\mathcal{O}(1)$ nonlinear model is $\frac{rx}{x + s}$ for some unknown $r, s \in \mathbb{R}$. So let's add it to our model
$$y = a \log\left(\frac{x}{b} + 1\right) + \frac{rx}{x + s}$$
nls_multstart finds us the following best fit (rounded to 3 decimal places):
$$y = 659.113 \log\left(\frac{x}{240.399} + 1\right) + \frac{-1213.555 x}{x + 739.845}$$
From the above plot, we can see the best fit curve closely approximates the data points.
The above residual plot is also roughly symmetric with respect to the $x$-axis.
The above normal quantile-quantile plot is also roughly a straight line.
Next, we need to show properties $1$ - $6$ holds. Property $1$ obviously holds because $y = \Theta(\log x) + \mathcal{O}(1) = \Theta(\log x)$ as $x \to \infty$. To see property $2$ holds, observe
$$f'(x) = \frac{a}{x + b} + \frac{rs}{(x + s)^2} = \frac{a(x + s)^2 + r s (x + b)}{(x + b)(x + s)^2}$$
Now we can obtain $f^{(k)}$ by keep applying the quotient rule. Because of how quotient rule works and $f'$ being a rational function, $f^{(k)}$ can only be non-differentiable at $-b = -240.399$ or $-s = -739.845$, which means $f$ is infinitely differentiable on $[0, +\infty)$.
Property $3$ holds by simple computation. Property $4$ holds because $$f'(0) = \frac{659.113}{0 + 240.399} + \frac{-1213.555}{(0 + 739.845)^2} = 1.101... \approx 1$$
Now $f'(x) = 0$ exactly when $$a(x + s)^2 + r s (x + b) = 659.113 x^2 + 77440.315994 x + 1449386311.619988 = 0$$ which has negative discriminant, implying $f'$ has no real roots and hence never crosses the $x$-axis. From above, $f'(0) = 1.101... > 0$ and $f'$ is differentiable hence continous on $[0, \infty)$. By the Intermediate Value Theorem, $f'(x) > 0$ on $[0, +\infty)$ holds (property $5$). Similar arguments show $f''$ has no non-negative real roots, continous, take a negative value at $x = 0$, and hence property $6$ holds again by the IVT.
Here is the R code I use for regression and plots:
## To the extent possible under law, the author(s) have dedicated all
## copyright and related and neighboring rights to this software to the
## public domain worldwide. This software is distributed without any
## warranty.
You should have received a copy of the CC0 Public Domain Dedication
along with this software.
If not, see <https://creativecommons.org/publicdomain/zero/1.0/>.
library('nls.multstart')
options(digits=8) # show 3 decimal places in summary
draw the residual plot using the x data points and residuals
residual_plot <- function(x, res)
{
plot(x, res, ylab='Residuals')
abline(0, 0)
}
draw the normal quantile-quantile plot using the residuals
norm_q_q_plot <- function(res)
{
qqnorm(res)
qqline(res)
}
draw the best fit curve using the x, y data points
and a function f taking x data points to best fit value
best_fit_curve <- function(x, y, f)
{
plot(x, y)
lines(x, f(x), col='red')
}
data points
x <- 0:41 * 25
y <- c( 0, 24, 48, 68, 84, 102, 114,
128, 142, 152, 164, 174, 182, 194,
202, 212, 218, 230, 236, 244, 252,
260, 268, 274, 282, 288, 296, 302,
308, 316, 322, 328, 336, 342, 348,
354, 360, 366, 372, 378, 384, 390)
best fit model of the equation a * log(x / b + 1) + r * x / (x + s)
model <- nls_multstart(y ~ a * log(x / b + 1) + r * x / (x + s),
iter=500,
start_lower=-rep(max(x), 4),
start_upper=rep(max(x), 4),
supp_errors='Y')
compute best fit value from x data points using the best fit model
f <- function(x)
{
coefs <- round(coef(model), 3)
coefs['a'] * log(x / coefs['b'] + 1) +
coefs['r'] * x / (x + coefs['s'])
}
residuals
res <- y - f(x)