2

Problem:
Consider fitting the following data (y-values): $$ 16, 60, 180, 400, 700, 1200, 2000, 3000, 4300, 6000, 8000, 10500$$ where the x-values range from 1 to 12

into the following model: $$ y = \alpha + \beta x^\gamma $$ where $\alpha,\beta \in \mathbb{N}$ and $\gamma \in \mathbb{R}$.
The integral constraints may be relaxed. Exponential model may be considered.


Attempts and Questions:

I've tried:

  • Building the model in Geogebra and fiddling around with the parameters.
  • Using Wolfram Alpha, applying power fit, and exponential fit directly on the data. Wolfram uses least-squares best fit. The integral constraints were not satisfied.
  • Linearise the model based on Ian Miller's answer using Excel with $\beta = 11.05077$ and $\gamma = 2.6908$.

    1. What would be a general way to approach this sort of problem?
    2. Ignoring $\alpha$, is my model equivalent to the one used in wolfram? (confusion on my part, addressed)
    3. Should I minimise the error using percentage (relative error) instead of traditional squared difference? $10500 \pm 200$ shold be more tolerable compared to $16 \pm 200$, for instance.

Here are the results: $$ \begin{array}{cc|cccc} & & \text{Geogebra} & \text{Wolfram (power)} & \text{Wolfram (exponential)} & \text{Linearisation} \\ x & y & 4x^{3.17} & 4.88906x^{3.08717} & 198.887e^{0.33343x} & 11.05077x^{2.6908}\\ \hline 1 & 16 & 6 & 4.88906 & 277.596 & 11.05077\\ 2 & 60 & 48 & 41.5486 & 387.4539 & 71.35166\\ 3 & 180 & 162 & 145.271 & 540.7877 & 212.4371\\ 4 & 400 & 384 & 353.091 & 754.803 & 460.6971915\\ 5 & 700 & 750 & 703.177 & 1053.514 & 839.81\\ 6 & 1200 & 1296 & 1234.56 & 1470.44 & 1371.646\\ 7 & 2000 & 2058 & 1986.95 & 2052.362 & 2076.741\\ 8 & 3000 & 3072 & 3000.67 & 2864.579 & 2974.59\\ 9 & 4300 & 4374 & 4316.52 & 3998.229 & 4083.836\\ 10 & 6000 & 6000 & 5975.79 & 5580.518 & 5422.412\\ 11 & 8000 & 7986 & 8020.13 & 7788.993 & 7007.643\\ 12 & 10500 & 10368 & 10491.6 & 10871.47 & 8856.323\\ \end{array} $$

  • Is the first value for $x=0$ or for $x=1$? In geogebra you use $x=0$ but with excel you use $x=1$ – John Alexiou Oct 26 '15 at 17:12
  • I forgot I added an extra (x,y) = (0,0) in geogebra for the origin, the other pairs remained the same though, and the last data point in geogebra is the (12, 10500). I will reflect this in the question to make it clear, thanks :) I forgot to mention, so, yes, this means y can be anything at x=0. – Little Pillow Oct 26 '15 at 17:35
  • The point $(0,0)$ implies that $\alpha=0$ which might not be the case. You are leading the model by adding this point. – John Alexiou Oct 26 '15 at 18:33
  • Fair point (pun unintended). The geogebra file did not fix the model based on any of the points though, they only serve as a visual cue to get a general idea of the fitness and I simply play around with the parameters using the sliders. I've been ignoring it all this time. I will remove that point so there's no more confusion. – Little Pillow Oct 26 '15 at 18:44
  • @sllnJin This question has blown out a lot since you originally asked it. I suggest you attempt to refocus it to a specific thing. Are you interesting in investigating/learning about techniques or are you wanting to find the best fit for you data? – Ian Miller Oct 27 '15 at 04:56
  • I didn't think it would grow this big because I didn't know how big the subject was. During the course of discussing this, I also learned enough about the subject to realise that it wasn't so much as data fitting that I was interested in, but it was the model building. I was interested in figuring out if my model is right, and if not, how should I go about reverse engineering it. The only reason why I chose the model I described was because it was used to generate other data; the only lead I had. I agree with your remark that this is getting out of control. I will try to clean up the question. – Little Pillow Oct 27 '15 at 06:25
  • (exceeding character limit, continuing) However, I also found out that by reflecting this change, I cannot keep all of the existing answers relevant. Would the best option be to start a new, slightly related question for model building (if necessary after my own research on it) and accept one of the answers here according to the spirit of the original question about data fitting? I will, of course, try to edit the question to make it better while keeping the focus as is if this option is taken. – Little Pillow Oct 27 '15 at 06:37

3 Answers3

2

Mathematica provides a NonlinearModelFit function. Using this on your data gives as best fit $y=23.0164 + 4.65795\,x^{3.10605}$. Unfortunately integer constraints are not accepted. However, the best fit suggests taking $\alpha=23$, $\beta=5$. This gives $y=23 + 5\,x^{3.07641}$. The residuals are $$ {-12., -5.17575, 10.1776, 21.2413, -29.7907, -61.4688, -12.945, -23.8777, -34.3632, 15.1153, -16.272, 30.331} $$

  • This method gives a really good fit. I do not have access to Mathematica, however, and being able to solve this kind of problem beyond the 15-day trial periods is essential. I'm taking a look at the documentation for NonlinearModelFit method and was wondering if it would be equivalent to this Nonlinear least squares section I just found in the R manual since I do have access to R. – Little Pillow Oct 26 '15 at 15:26
  • Yes, I think it uses non linear least squares. – Julián Aguirre Oct 26 '15 at 16:11
  • Another thing you could try is "guess" a good value for $\alpha$ (it seems reasonable of start with $\alpha=16$), use a power model and choose $\beta$ to be the closest integer to the parameter four. This implies repeating the process for different values of $\alpha$. – Julián Aguirre Oct 26 '15 at 16:14
  • I've been trying to play around with different values for the parameters for a bit. Is there a more intelligent and systematic way of adjusting the values based on what the current one yields? For example, if the current parameters give a curve that _under_estimates earlier values, but _over_estimates later values, then we should adjust $\gamma$ to make it "bend" less, and then we may have to bring $\beta$ up a bit to push everything up since we bent down slightly too much, and so on and so forth. The model should be able to inform us of these behaviours, right? – Little Pillow Oct 26 '15 at 17:32
  • What you say seems reasonable, but I really can't tell. Moreover, $\alpha$ and $\beta$ are integers, so you cannot push them a little bit. – Julián Aguirre Oct 26 '15 at 18:24
  • @JuliánAguirre. It is a funny problem, indeed ! If you have time to waste, have a look at my answer and, please, comment. – Claude Leibovici Oct 27 '15 at 06:15
1

Note: Your model is a power model, e.g. $x$ is raised to a power. This is very different to a exponential model from Wolfram which has $x$ in the exponent position. There are many possible models you can attempt to fit your data to. Which one to pick will depend on the shape of the data and the physics behind what is generating the data - sometimes you know or suspect a certain link between the data.

This also depends what you mean by best fit. Wolfram probably has used a least squares fit. What you are describing is trying to minimize the percentage error (probably with a least squares approach). As such you will get two different results.

A lot of software packages will actually end up minimizing something other than the least squares fit as they will linearize the data based on the model then find the linear equation of best fit before converting the coefficients from that back to the chosen model. For example a standard exponential model of the form: $y=\alpha e^{\beta x}$ can be rewriten as $\ln y=\ln\alpha+\beta x$. If you plot $x$ vs $\ln y$ you will get a straight line with gradient of $\beta$ and intercept of $\ln\alpha$.

Your model however can not be linearized. Using three parameters prevents this (as far as I know). From your picture it appears you have set $\alpha$ to zero so if we do that your model can be linearized as follows:

$y=\beta x^\gamma$

$\ln y = \ln\beta + \gamma\ln x$

So if you take your data and make two new columns: $\ln y$ and $\ln x$ they should fit a straight line with gradient of $\gamma$ and intercept of $\ln\beta$. This will not minimize the sum of least squares but will minimize the sum of least squares of the logarithms of the values which might look more like what you were after. Again best fit can be somewhat subjective.

UPDATE: I've taken your table and looked at the sum of the least squares for both models and the the linearized model is better that the Wolfram one using the least squares of the logarithm as the metric for 'best fit'.

($y_m$ is from the linearized model and $y_w$ is from the Wolfram model)

$$ \begin{array}{c|c|cc} & y & \ln x & \ln y & y_m & y_w & \ln y_m & \ln y_w & \ln y_m - \ln y & \ln y_w - \ln y & (\ln y_m - \ln y)^2 & (\ln y_w - \ln y)^2 \\ \hline & y & ln x & ln y & y_m & y_w & ln y_m & ln y_w & ln y_m - ln y & ln y_w - ln y & (ln y_m - ln y)^2 & (ln y_w - ln y)^2 \\ & 16 & 0 & 2.77259 & 11.05062 & 4.88906 & 2.40249 & 1.587 & -0.3701 & -1.18559 & 0.13698 & 1.40562 \\ & 60 & 0.69315 & 4.09434 & 71.35208 & 41.54858 & 4.26763 & 3.72686 & 0.17328 & -0.36748 & 0.03003 & 0.13504 \\ & 180 & 1.09861 & 5.19296 & 212.44079 & 145.27129 & 5.35866 & 4.9786 & 0.16571 & -0.21435 & 0.02746 & 0.04595 \\ & 400 & 1.38629 & 5.99146 & 460.70883 & 353.09121 & 6.13277 & 5.86673 & 0.1413 & -0.12474 & 0.01997 & 0.01556 \\ & 700 & 1.60944 & 6.55108 & 839.83639 & 703.1769 & 6.73321 & 6.55561 & 0.18213 & 0.00453 & 0.03317 & 0.00002 \\ & 1200 & 1.79176 & 7.09008 & 1371.69573 & 1234.55534 & 7.2238 & 7.11847 & 0.13373 & 0.02839 & 0.01788 & 0.00081 \\ & 2000 & 1.94591 & 7.6009 & 2076.82579 & 1986.94883 & 7.6386 & 7.59436 & 0.03769 & -0.00655 & 0.00142 & 0.00004 \\ & 3000 & 2.07944 & 8.00637 & 2974.72218 & 3000.66617 & 7.99791 & 8.00659 & -0.00846 & 0.00022 & 0.00007 & 0 \\ & 4300 & 2.19722 & 8.36637 & 4084.0311 & 4316.52455 & 8.31484 & 8.37021 & -0.05153 & 0.00384 & 0.00266 & 0.00001 \\ & 6000 & 2.30259 & 8.69951 & 5422.68738 & 5975.79055 & 8.59835 & 8.69547 & -0.10117 & -0.00404 & 0.01023 & 0.00002 \\ & 8000 & 2.3979 & 8.9872 & 7008.01714 & 8020.13397 & 8.85481 & 8.98971 & -0.13239 & 0.00251 & 0.01753 & 0.00001 \\ & 10500 & 2.48491 & 9.25913 & 8856.81688 & 10491.59058 & 9.08894 & 9.25833 & -0.17019 & -0.0008 & 0.02896 & 0 \\ & & & & & & & & & \Sigma: & 0.32635 & 1.60308 \\ \end{array} $$

Ian Miller
  • 11,844
  • So we are basically transforming the data into a linear model using the assumption that we have about the model of the original data, then we find line of best fit for this linear model, which should now be a lot easier, then transform everything back! Ok, I'm fiddling around using this idea and will be updating the question, but I can't seem to obtain a good result from this yet :/ – Little Pillow Oct 26 '15 at 15:45
  • It seems I was brain-dead when working on this thinking that $x^\gamma$ is exponential and so I got confused when I saw the actual exponential function and still didn't realise it until you pointed it out. When I put power instead of exponential into wolfram, I did get results really close to what Julián Aguirre has done. Not quite sure how I should reflect this realisation on the questions. – Little Pillow Oct 26 '15 at 17:48
  • @Update Looking at the sum of least squares of differences of logarithms alone and I would not gather that $y_m$ is this far off compared to $y_w$. Most of the $y_m$ values were off by an order of magnitude. I can't help but doubt that I have done something wrong for the $y_m$. Am I right to assume a perfect fit isn't possible unless we overfit it or find the correct model? This goes back to your first point about understanding the characteristics of the data itself. – Little Pillow Oct 27 '15 at 04:05
  • 1
    @sllnJin I had made a critical typo in my original answer (and then carried it through). The slope of the linearized equation should be $\color{red}{\ln}\beta$ not just $\beta$. I've updated the table since and by that metric the linearized model is better. – Ian Miller Oct 27 '15 at 05:07
  • I would like to learn a bit more about your first paragraph. Is there any introductory resource(s) you could point me to? Your inputs have been very educational. – Little Pillow Oct 27 '15 at 08:11
1

I have been doing almost the same as Julián Aguirre did. Using $\alpha,\beta \in \mathbb{N}$ and $\gamma \in \mathbb{R}$ as yo you wished, I varied parameters $\alpha,\beta$ (running the calculation over a quite large grid o values) until I get a minimum value of the sum of squares.

It appears that, whatever $\alpha$ could be, $\beta=5$ always leads to the minimum value of the sum of squares. For this value of $\beta$, the following table gives the sum of squares as a function of $\alpha$ in the area of the minimum of $SSQ$. $$\left( \begin{array}{cc} \alpha & SSQ \\ 0 & 6988 \\ 1 & 6908 \\ 2 & 6842 \\ 3 & 6790 \\ 4 & 6752 \\ 5 & 6729 \\ 6 & 6719 \\ 7 & 6724 \\ 8 & 6743 \\ 9 & 6776 \end{array} \right)$$

while $\alpha=23$ would lead to $SSQ=8723$.

For the optimum value $(\alpha=6)$, rounded to next integer, the predicted values are $$\{11,48,153,362,714,1247,2000,3013,4327,5982,8019,10479\}$$ the residuals being $$\{5,12,27,38,-14,-47,0,-13,-27,18,-19,21\}$$ and the model is $$y=6+5 \,x^{3.07743}$$