1

I have the following set of data:

enter image description here

Plotting this on Excel gave me a nice, smooth curve:

enter image description here

With the Polynomial Regression function, I get a 5th order polynomial that fits very well:

enter image description here

The issue arises when I transfer both the data values and the best fit curve to Desmos. The two don't seem to have any correlation at all!

enter image description here

I've checked the equations and the data innumerable times, and there doesn't seem to be any typos. Any ideas for why this discrepancy arises? I've done the regression, the data plotting, and the typo-check numerous times (even on multiple laptops), but have had no luck so far.

Any help will be greatly appreciated, thanks in advance.

  • 1
    Probably due to the precision of the coefficients.You are throwing some big numbers around in that polynomial to get a small value out. – Joffan Jul 30 '16 at 07:03
  • As confirmed with @ClaudeLeibovici's answer, the precision of the coefficients it was. Amazing how you spotted it - Thank you. – StopReadingThisUsername Jul 30 '16 at 08:11
  • I think the underlying issue is that a 5th-order polynomial is overfitting. You probably get more sensible numbers with a fourth-order fit. – Joffan Jul 30 '16 at 14:01

1 Answers1

3

If you want to try with exact values of the coefficients for the model $$y=\sum_{i=0}^5 a_i x^i$$ $$a_0=-\frac{1367716921770523}{33264123750}$$ $$a_1=\frac{318769904368}{12096045}$$ $$a_2=-\frac{898557942649}{133056495}$$ $$a_3=\frac{115064631832}{133056495}$$ $$a_4=-\frac{490812512}{8870433}$$ $$a_5=\frac{3423040}{2419209}$$

As Joffan commented, the problem is more than likely due to the precision of the coefficients generated by Excel.

Edit

Concerning the problem of the overfit (as commented by Joffan), let us have a look at the sum of squares of residuals $S_n$ as a function of $n$, the degree of the polynomial. We have $$\left( \begin{array}{cc} n & S_n \\ 1 & 0.176991 \\ 2 & 0.050324 \\ 3 & 0.017965 \\ 4 & 0.003115 \\ 5 & 0.002169 \\ 6 & 0.000903 \\ 7 & 0.000707 \\ 8 & 0.000661 \end{array} \right)$$

As you can see, for $n > 4$, the improvement is just very marginal. If I may suggest, plot $S_n$ as a function of $n$ and this will become quite clear.

  • Thank you so much. If I could, I'd post an image of this new curve going perfectly through all of the data values, but I unfortunately cannot. I do not understand how you were able to compute the exact values, though; could you please explain it to me? – StopReadingThisUsername Jul 30 '16 at 08:10
  • Well ! I wrote $7.45=\frac{745}{100}$, $-0.739=-\frac{739}{1000}$ and so on; then I minimized the sum of the squared errors using exact arithmetic (through the normal equations). – Claude Leibovici Jul 30 '16 at 08:29
  • I did it on excel, but got the answers rounded to a number of decimal places. How can I get exact fractions? Also, it took me a lot of time to get the rounded manually as I had to make a 6x6 matrix and do some matrix multiplication. Is that how you did it? – StopReadingThisUsername Jul 31 '16 at 01:04
  • Please respond - I did it on excel, but got the answers rounded to a number of decimal places. How can I get exact fractions? Also, it took me a lot of time to get the rounded manually as I had to make a 6x6 matrix and do some matrix multiplication. Is that how you did it? – StopReadingThisUsername Jul 31 '16 at 03:56
  • As I said earlier, I converted the values to fractions in order to be able to use exact arithmetic. Then, I used the so-called normal equations in order to compute all the required sum and finally solved the $6\times 6$ matrix problem to get the coefficients. – Claude Leibovici Jul 31 '16 at 04:05
  • If you don't mind me asking, what software did you use? – StopReadingThisUsername Jul 31 '16 at 04:07
  • No problem ! I have a very old version of Maxima which I modified over many years for some specific needs of mine. You can download it for free from SourceForge http://maxima.sourceforge.net/download.html – Claude Leibovici Jul 31 '16 at 04:12
  • @ClaudeLeibovici How were you able to get an entire table of the Sum of Square of Residuals values? Did you have to compute the solutions to the $3x3, 4x4, 5x5...$ matrix to do so? If not, how did you do it? – GoodChessPlayer Aug 02 '16 at 12:47
  • 1
    @GoodChessPlayer. This is what I had to do. But f you think abour it, it is very small work (almost when using matrix calculations). Cheers. – Claude Leibovici Aug 02 '16 at 17:24
  • @ClaudeLeibovici I have similar data as what OP has posted, and I'm also trying to find exact coefficients. Your answer begins with a simple equation - is that all you used to compute the exact coefficients? I am using Excel, and the way I'm doing it is very tedious (I have to compute the sum of all x, x^2, x^3... from the data, create a matrix, inverse the matrix, and then multiply it with another matrix). And I have to repeat this process for every different ordered polynomial, before I can compute the SSR values. So, the way I am doing it is not easy work - Is that how you did it as well? – GoodChessPlayer Aug 03 '16 at 05:26
  • @ClaudeLeibovici I am amazed by how you were able to do all that in just an hour after OP posted the question. Please do tell me a way I can make my method less tedious, if you know/used a different approach. – GoodChessPlayer Aug 03 '16 at 05:27
  • 1
    @GoodChessPlayer. As I said, I use exact arithmetics to obtain the exact coefficients; I do not suppose that you could do this with Excel. For the SSR, you do not need so much accuracy. In fact, I was cheating a little since I have a sofware of mine doing all the work automatically given the degree of the polynomial (the result is also the exact SSR). – Claude Leibovici Aug 03 '16 at 05:38
  • @ClaudeLeibovici I understand. I want to thank you for your time. – GoodChessPlayer Aug 03 '16 at 05:57
  • 1
    @GoodChessPlayer. You are very welcome ! Do not hesitate to contact me if I can be of any help (my e-mail address is in my profile). – Claude Leibovici Aug 03 '16 at 05:58