1

I have a set of 2D coordinates that represent a curve, and I'm struggling to find a function that roughly matches them.

Is there any software (free, preferably) that can fairly accurately guess a function that mostly matches a set of points (y = f(x))?

  • Hm... I think first you have to plot the points and see roughly what kind of function will be good. I mean exponential, polynomial, logarithmic, harmonic, etc... You can simply guess by its properties (decreasing, increasing, number of zeros, so on). After that nearly every statistical software (Minitab, SPSS, R, for example) or general software (Maple for example) can do it. I mean if you give them the type (exp, poly, etc) then they can find the best-fitting in some sense. –  Jun 11 '15 at 16:31
  • Does the set have a lot of elements? If you give me the data, then I can give clearer advice. –  Jun 11 '15 at 16:42

2 Answers2

2

Since you said "mostly" fits the points, if you make this formal by assuming you have data that is function values with noise, for some underlying function, then you can do linear regression in any free tool like octave or R, where you estimate a polynomial fit of fairly low degree compared to your number of points, using linear regression to estimate the polynomial coefficients. Regression is fast so you should be able to do it 10 times, in which case you can even figure out the best polynomial degree for your data: Split the data randomly into 10 sets of roughly equal size. Then for each possible degree of polynomial, estimate unbiased the error for the polynomial fit by taking each of the 10 sets separately to be the "test" set that you evaluate sum-of-squared error on, and for each test set, take the other 9 sets and compute the polynomial fit for those 9 sets combined and then evaluate the sum-of-squared error on the left-out set. Whatever polynomial degree gives you the lowest sum-of-squared error (summed over all 10 choices of test set) is the degree of polynomial you should use. This process is known as cross-validation, and it's easy to code up in a language like octave or R that will do the linear regression part for you.

Any smooth function can be approximated by a polynomial over an interval, under mild assumptions, where the approximation gets better as the degree of the polynomial gets higher (as long as you're not over-fitting, hence the cross-validation) so this is a good general approach if you don't know what kind of smooth function to use for your fit.

However, so-called "local polynomial regression" invented by Wasserman is often much more accurate than global polynomial regression. The software package R has this implemented in its procedure loess(). I believe there is no need for cross-validation for that procedure in R. The estimated fit is not given by any simple formula but I believe loess() or a related procedure will give you predicted function values for any input values you want, so for example you can use a very finely uniform spaced grid of input values and plot the fitted function values for them, so you can see what the fitted function looks like.

user2566092
  • 26,142
1

Assuming that you can approximate the curve with a polynomial and you have a sufficient amount of points, you can fit a curve into it using any software that solves you a system of equations:

say you have points $P = {p_1,p_2, \dots, p_n}$. You construct a $n-1$ degree polynomial $P(x)$, and form $n$ equations in terms of the coefficients of the polynomial. For example, with two points it would look like: $$ a_1x_1+a_2 = y_1 $$ $$ a_1x_2+a_2 = y_2 $$

This is called interpolating, and more specifically, interpolating with a polynomial. The method guarantees that the polynomial will represent the curve in points you specified, but it has no guarantees about how the polynomial represents the curve in the neighborhood of the points. If you know the value of the curve's derivative in $x$, you can use Taylor expansion. link