Why do error models often contain "squares" of a value?

Question

In many error minimization or approximation models, they often do operations on "sum of squares" of the calculated value. (E.G. residual sum of squares)

What is the purpose of squaring the error? Is it to prevent negative values? If so, why not just use absolute value?

Basically its so that if you assume the errors are normally distributed, you can calculate the probability of an error within a given range using standard probability tables. — D Stanley, Sep 11 '18 at 16:02
I understand if it does, I realize there is stackExchange that deals with math problems. — Taylor, Sep 11 '18 at 16:42
Sum of square of errors is one choice among many. It makes the math easier. — duffymo, Sep 11 '18 at 18:04

score 3 · Accepted Answer · answered Sep 11 '18 at 16:22

The sum of squared errors is the variance, an important quantity in statistics, with many convenient properties.

If your errors have a "normal" distribution, and the model is linear, minimizing squared error gives you the "maximum likelihood estimate". Briefly, this is a statistical best fit, and is likely the best you can do.

Even if the model is nonlinear, the sum of squared errors is a nice, smooth, easy-to-compute, easy-to-differentiate function, which is easy to optimize over. The sum of absolute values is not smooth where any of the values crosses zero, which makes optimization more difficult.

One disadvantage of the sum of squared errors is that "outliers" with large errors can skew the results disproportionately -- because their errors are large, so squaring them makes them worse. In this case, using the absolute error instead of the squared error can mitigate the problem, if you can structure your optimizer to handle the non-smooth error function.

Why do error models often contain "squares" of a value?

1 Answers1