0

I have a load of data that consists of websites and the "power" of their backlinks. The lowest power is 1. The highest is 949876. The average is 6056.

I want to be able to assign a site a ranking from 1-100 that denotes how powerful they are. I thought I could do this using a nice graph like this one (It's supposed to be a smooth curve ;-) https://i.stack.imgur.com/0fIRa.png

So, on the $x$-axis I have values from $0$ to $100$. I know the following points: $$x = 1, y = 1$$ $$x = 50, y = 6056$$ $$x = 100, y = 949876$$

How can I find $y$ for another value of $x$? .. or am I approaching this in totally the wrong way?

Thanks!

  • Is your function a linear function? – Sigur Jan 27 '13 at 21:31
  • 1
    there are several functions for which this could be the case, can we assume it would be of the form ax^2+bx+x=y? – kaine Jan 27 '13 at 21:32
  • @Sigur - Sorry,it's non linear. – rastaboym Jan 27 '13 at 21:34
  • 3
    Without knowing anything about your function, there is no way to answer the question. Knowing the function value at a few scattered points tells you absolutely nothing about the function value at other points. – mrf Jan 27 '13 at 21:36
  • @Kaine - Sorry, I don't understand your question. I don't have a function. This is actual data and I'm looking to work out the most probable value of say x = 67 based on those 3 points. – rastaboym Jan 27 '13 at 21:37
  • is this homework? – kaine Jan 27 '13 at 21:38
  • @mrf -I've edited the question! Thanks – rastaboym Jan 27 '13 at 21:46
  • @kaine - No ;-) Real world. I've got a load of values and I'm trying to weight them down to a value between 1 and 100. Unfortunately, my maths is very rusty – rastaboym Jan 27 '13 at 21:47
  • 2
    Having only three points, you can't do anything reasonable. Is the function quadratic (as suggested by kaine), exponential or something completely different? If you have more points or some explanation where your data comes from, it may be possible to pose more intelligent guesses as to what a good model would be. – mrf Jan 27 '13 at 21:50
  • @mrf - thanks, I've added more detail to the question. – rastaboym Jan 27 '13 at 22:11
  • The popularity of things often follows Zipf's law. You may be better off fitting that to your data. (Although then you should reverse the $x$'s so that the most "powerful" site is at $x=1$, the second most powerful at $x=2$, and so on.) –  Jan 27 '13 at 22:30

3 Answers3

2

Without more information, you can tell nothing about other values of x unless you know more about the graph.

If we assume that the equation is in the form $ax^2+bx+c=y$ which is likely the easiest form, replace x and y for each known value and solve for the values of a,b, and c. This would be the form I would fit data like this as an engineer unless I had more information knowing that this was a very inaccurate estimate.

these would be:

$$a+b+c=1$$ $$a*50^2+50b+c=6056$$ $$a*100^2+100b+c=949876$$

The values would be: $$a=218783/1155$$ $$b=-3671736/385$$ $$c=2159516/231$$

There are, however, and infinite number of other equations that would fit these 3 points. For instance: $$kx^3+jx^2+lx+m = y$$ would have those values on it for any value of k and some corresponding values of j,l, and m.

If you could give some context, I can help you out further but I need more information. For simple questions like these if I can't remember how to do it, I retreat to Wolfram Alpha first. (Later I would use Matlab or Scilab for most of my calculations.)

Addendum: if you have alot of them and you are not used to using other programs, place them in excel, and do curve fitting.

http://www.csupomona.edu/~seskandari/documents/Curve_Fitting_William_Lee.pdf

kaine
  • 1,672
  • Note that your fitted function is negative for most of the interval between $1$ and $50$, which is likely not what is wanted. This is another effect of the problem being severely underspecified. –  Jan 27 '13 at 21:58
  • @kaine - Many thanks. The graph I'm after looks something like the image I posted above (http://i.stack.imgur.com/0fIRa.png), so unfortunately your first suggested function doesn't work (as Rahul points out). I will edit my question with some more info... hang on ... – rastaboym Jan 27 '13 at 22:02
2

As others have pointed out, it is very difficult with the amount of information provided.

We could come up with an interpolating polynomial, such as:

$$f(x) = \frac{15314513}{80850}x^2 -\frac{257011521}{26950}x + \frac{15116018}{1617}$$

This meets your specified criteria, however, it could behave in ways you did not expect!

Here is a WA Plot.

Notice, that it meets your three point exactly, but does not appear like your graph.

We could play around and define more points to make this look closer to your plot.

Update

I used Mathematica's Interpolating Polynomial, that is, using WA Int Ploy.

If you could add more points, you would get better results (and maybe even a different curve type, using FindFit). This is part of Numerical Analysis.

Regards

Amzoti
  • 56,093
  • thanks for taking the time to answer. Wolfram Alpha plot is very useful. How did you come up with the polynomial? I could get some graph paper out and define a few more points. Then what? – rastaboym Jan 27 '13 at 22:18
  • Needs an $;\uparrow^+;;$ – amWhy May 05 '13 at 02:19
0

I'm doing this as a second answer because it is completely different.

I am assuming the form $ae^{bx}+c=y$ is suitable.

$$a=39.150079742937$$ $$b=0.100967288104848$$ $$c=-42.309401986998$$

Will that do? I can come up with others. Obviously you don't need that many decimal places.

kaine
  • 1,672