Finding a function coefficients from rounded values

Question

I have a floored quadratic function with unknown coefficients in such form:

$$y=\lfloor a(x+b)^2 \rfloor$$

I also have some (about a thousand) pairs of integer values, for example these:

$\begin{matrix} x & 280 & 281 & 282 & 283 & 284 & 285 & 286 & 287 & 288 & 289 &290 & 291 \\ y & 5026 & 5026 & 5027 & 5027 & 5027 & 5027 & 5028 & 5028 & 5028 & 5028 & 5028 & 5029 \end{matrix}$

How should I approach searching for coefficients $a,b$ ?

I don't have anything near the minimum of $y$, all the values that I have available happen to fall on a rather linear part of the curve.

The original equation was written by human and the correct coefficients almost certainly contain less than 10 significant digits, probably only 3-5. With simpler cases (where $x$'s near $y=0$ were present) it was obvious that I found the intended coefficients by the brevity of those values.

In this case I tried using Excel to put a quadratic trendline through and extract coefficients from the fitted equation but I couldn't manage to find the exact coefficients that would change to next integer at the correct $x$ value.

You mention that you have dealt with simpler versions of the same problem. It might help to know in what context these problems arose. What is the end goal of solving this? For what purpose did this other person write these equations? — Tyberius, Sep 05 '17 at 03:05
@Tyberius these are the IAAF scoring points for athletics. The example data is taken from women's pentathlon (indoor tables) offset by 5000. The combined events are the ones that I struggle unlocking formulas for. https://www.iaaf.org/news/iaaf-news/scoring-tables-2017 — Džuris, Sep 05 '17 at 12:22
From what I have found from searching, the issue might be that the combined events don't use a quadratic. From what I have seen, they use $a(b-x)^c$, where a, b, and c vary by event. I found a link to a PDF which contains these values on a Czech track and field website: http://www.atletika.cz/sys/FileStorage/download/8/7790/iaaf-scoring-tables-for-combined-events.pdf — Tyberius, Sep 05 '17 at 13:28
@Tyberius those formulas are for the IAAF Scoring tables for combined events. But I am looking at IAAF Scoring tables of Athletics. The difference is that your tables assign points for each of the events and are needed to make the overall score for the combined event like decathlon. The other tables (that I want) assign IAAF points equivalent to the points one got in the decathlon. I want the function that takes 6522 decathlon points as argument and assign 900 IAAF points to that performance. — Džuris, Sep 05 '17 at 17:01

JJacquelin · Answer 1 · 2017-09-10T11:05:02.177

The method proposed below is not fully justified on the theoretical viewpoint. Nevertheless it seems to give good results in practice.

I suppose that a more thorough study of the statistical theory could improve it.

The data is : $$(x_1\:,\:y_1)\:,\:(x_2\:,\:y_)\:,\:...(x_k\:,\:y_k)\:,\:...(x_n\:,\:y_n)$$ $x_k\,\:y_k$ are integers.

The function to fit is : $$y=\lfloor a(x+b)^2\rfloor$$ $$y\leq a(x+b)^2<y+1$$ $$a(x+b)^2\simeq y+0.5$$ $$Ax+B\simeq \sqrt{y+0.5}\quad \text{with}\quad \begin{cases}A=\sqrt{a}\\B=b\sqrt{a} \end{cases}$$ CALCULUS : $$\begin{bmatrix} A \\ B \end{bmatrix}= \begin{bmatrix} \sum_{k=1}^n x_k^2 & \sum_{k=1}^n x_k \\ \sum_{k=1}^n x_k & n \end{bmatrix}^{-1} \begin{bmatrix} \sum_{k=1}^n x_k \sqrt{y_k+0.5} \\ \sum_{k=1}^n \sqrt{y_k+0.5} \end{bmatrix} $$ $$a=A^2\quad;\quad b=\frac{B}{A}$$ COMPUTED VALUES : $$Y_k=\lfloor a(x+b)^2\rfloor$$ With the data given by the OP, the result is : $$a=0.00000272881371\quad;\quad b=42639.1823$$ The computed $Y_k$ are all equal to the data values $y_k$.

This is probably a lucky case. But this shows anyways that the method isn't bad.

NOTE : The computation has to be carried out with accuracy. The numerical values of computed $a$ and $b$ must have enough digits. If they are rounded, some of the $Y_k$ can be deviated of $\pm$ one unit.

This is shown on Figure 2 where $a$ and $b$ are rounded : One point is deviated, compare to Figure 1.

Even more on Figure 3 where $a$ and $b$ are a bit more badly rounded : Four points are deviated.

Do you have any clues how to select values for sample for this method? Should I focus on the ones between the $y$ shift? I tried your values and outside the given sample range they don't work that well. btw I think you wanted $a=2.729/cdot 10^{-6}$ for the Figure 3, didn't you? — Džuris, Sep 10 '17 at 13:44
Sorry, I have no clue (In fact, don't understand exactly what it means). For all the Figues, the values of $a$ and $b$ used to compute $Y_k$ are written on each Figure. With $a=4.729:10^{-6}$ it is worse : seven points are deviated of $\pm$one unit. — JJacquelin, Sep 10 '17 at 15:27
Hi Claude ! It should be interesting to compare the models of the kind : $$SSQ=\sum_{i=1}^{12} \left(F[a(x_i+b)^2]-F[y_i+\frac 12] \right)^2$$ respectively with $F(X)=X$ (your model), $F(X)=X^{\frac 12}$ (mine) or with other functions $F(X)$. Of course with larger data than $n=12$ in order to make the differences more visible. Ideally $F(X)$ should be approached empirically or better from statistical investigation. A lot of work for the OP! If the game worth the candle. Cheers. — JJacquelin, Sep 13 '17 at 20:22

Claude Leibovici · Answer 2 · 2017-09-13T08:45:31.463

Incredibly too complex but done for the fun of it.

Inspired (once more) by JJacquelin's answer, I worked the minimization of $$SSQ=\sum_{i=1}^{12} \left(a(x_i+b)^2-(y_i+\frac 12) \right)^2$$ The model being linear with respect to $a$, we can express this parameter as a function of $b$ from $SSQ'_a=0$.

Plugging this expression in $SSQ'_b$ and simplifying, we end with the following monstrous polynomial in $b$ $$12127335 b^6-499793526603 b^5-728283224398290 b^4-418674571583641435 b^3-119935913306663810880 b^2-17155664107076603575730 b-980925105997787461048944$$ which, thanks to a CAS, is the product of two factors $$60335b^2+34451352 b+4918659046$$ (no real roots) and $$201 b^4-8398413 b^3-7291542096 b^2-2091027730787 b-199429376345064$$ which has four real roots (expressed with a bunch of radicals). Three of the roots are negative and the decimal representation of the root of interest is $$b=42639.649084838494786403657432884943571843815973595$$ leading to $$a=2.728754367425605034210057290271636640799352020754\times 10^{-6}$$ so close to the numbers reported in JJacquelin's answer and leading, for sure, to the same results.

Sorry, I can't follow language that well. What is $SSQ'_b$? How did $b^6$ happen to get there? I only see $b^4$ in SSQ. — Džuris, Sep 13 '17 at 09:43
@Džuris. Sorry for that : Sum of SQuares. The $b^6$ term is before the factorization. — Claude Leibovici, Sep 13 '17 at 09:47

user357980 · Answer 3 · 2017-09-05T02:14:59.877

This is by no means a full answer, but these are just my thoughts. Write $f(x) = a(x+b)^2$ to be the original function before you take the floor of its value to give the input, which we will call $g(x) = \lfloor f(x) \rfloor$ .

Perhaps you could try to put it in Excel, but only the points where the value of $y$ changes. I would imagine that all the flat parts is confusing it. For example, you know the real value where $f$ gives $5027$ is between $281$ and $282$. Maybe you could just assign $5027$ to their midpoint since on average it would seem that the change might be in the middle.(?)

Another strategy could be to write $$f(x+1) - f(x) = a(x+b+1)^2 - a(x+b)^2$$ $$ = a (x^2 + 2(b+1)x + (b+1)^2 - x^2 - 2bx - b^2) = a(x + 2b + 1),$$ which grows relatively linearly. And if we had it we could easily find $a$ and $b$. The question is how does $g(x+1) - g(x)$ behave in general and how is it related to $f(x+1) - f(x)$? Because $g$ is not changing much for these very large values and since you say that it seems fairly linear, I would imagine that $a$ is rather small, no matter what $b$ is. I would note that since it seems that $a$ is positive (the number of steps to change from $5027$ is two steps shorter than the number of steps to change from $5028$) and the vertex ($-b$) is less than the area that you are looking at since $g$ is increasing, it seems to me that the best indicator how what $f$ is is at the values of $g$ that are the largest, since there we see the least effect of the rounding down. You might want to try to fit a line to the differences of these values, where you plot a change in the middle of the interval where the jump happened. For instance, for this info, you might include: $(281.5, 5026.5), (285.5, 2027.7), (290.5, 2028.5)$ In fact, for finding $a$ it should not matter if there is a $.5$ or not since the rate of change is not affected by adding $0.5$ to each of the corrdinates on the line. It might matter in the long run though for the quadratic fit for large values of $x$.

If you also look not at neighboring steps but much larger jumps: $f(x + n) - f(x) = a(2nx + 2bn + n^2),$ with an error of $\pm 2$ in the measurement, because of the rounding... but in general the error is likely to be near $1$.

There is only one step difference between changing from 5027 and changing from 5028. I included that so it would be visible that the rounding does that... actually in that range $y$ changes after every 4 steps but about every sixth time it takes a fifth "leap" step. To see the apparent linearity here are some $(x,y)$ pairs: $(286,5028), (3119,5740), (5384,6344)$. And you are correct in your value expectations - I found by my Excel approach that $a\approx 3\cdot 10^{-6}$ and $b\approx 41000$. — Džuris, Sep 03 '17 at 12:28
Are you sure about your last expression? I just tried repeating it and I'm pretty sure $a$ is a common factor for all of the terms not only for the first one. — Džuris, Sep 03 '17 at 12:33

Finding a function coefficients from rounded values

3 Answers3