2

I have a bunch of x's and their corresponding y values, but do not have a Wolfram Pro account. Is there another site where I can input my dataset and have it spit out a best-fit regression (be it linear, cubic, quadratic, etc)?

user51819
  • 1,161
  • 1
  • 8
  • 15
  • Just in case you do not know: The R http://www.r-project.org/ is a good choice. – Yes Aug 18 '14 at 15:05
  • 1
    How about thinking before pressing some button? There are a zillion ways to "fit" a function to data, most of these you may not want. – Han de Bruijn Aug 18 '14 at 15:12
  • @HandeBruijn Sorry for not being a genius! – user51819 Aug 18 '14 at 15:37
  • 1
    @handebruijn Where does he say he will not think before using the tool? A tool is a tool, using it doesn't exclude thinking. – Daniel R Aug 18 '14 at 16:07
  • @DanielR : That is true: the fact that many students use calculators stupidly doesn't mean there's anything wrong with the calculators, and the fact that a student is shopping for a calculator doesn't mean the student wants to use it without thinking. But there's more to finding a "best fit" than just running an algorithm! What constitutes a "best fit" is highly context-dependent. ${}\qquad{}$ – Michael Hardy Aug 18 '14 at 20:19

1 Answers1

0

There is such a thing as ordinary least squares.

There is such a thing as weight least squares, perhaps used when data points represent groups of different sizes. Some fish grow at a rate proportional to their sizes; the variance of the logarithm of their random growth next year may thus be proportional to their present age; in such a case one would take the logarithm of the growth and used a weight proporitional to the reciprocal of the age if the age is known, and perhaps proportional to the fish's size if the age is not known.

There is generalized least squares in which one minimizes the norm of the vector of residuals with respect to an inner product whose off-diagonal entries are not zero. That is used when errors are known to be correlated.

And there are methods other than least squares. In logistic regression, each response is $0$ or $1$, and a fit is done by iteratively reweighted least squares.

So what is the "best fit" is not always something an algorithm can give you without some understanding of the thing you're modeling.

A paper in the New England Journal of Medicine purported to do a regression in which chocolate consumption in each of 22 countries was correlated with the number of Nobel Prizes won by the inhabitants. It reported a p-value. But the validity of the p-value dependent on an assumption of normal distribution, which anyone looking at the scatterplot would question. The scatterplot looked like the kind of thing where if you take the logarithms of both the $x$- and $y$-coordinates, you might get a bivariate normal. I tried it and the result looked good. One of many instances (really every applied problem you'll ever see) in which some skull sweat should be done before feeding the data into some algorithm.

I think there are web sites that can give you least squares fits.