In the estimation of a parameter say the average of a population the definition of "bias" is very clear. It is the difference between the average estimator value (averaged over random samples) and the true value of the parameter.
In machine learning models the same term "Bias" (as in bias variance tradeoff) is used but I have seen many different definitions of it. Is there a standard definition?
I have seen it defined as:
- The bias at a specific point $x$ is the average difference (over samples) between $\hat f(x)$ and $f(x)$
- As before the average difference but averaged over all data points $x$ and not at a specific data point
- The difference between the best $\hat f$ in the hypothesis class and $f$
- The difference between $\hat f$ and $f$ as the number of records in the sample goes to infinity.
I would also like to know if we should use a different definition for regression where it makes sense to talk about a systematic error (because we can underestimate the value and overestimate it) and classification where every error is systematic because there is only one type of error that can be made.
A final question is how all this would apply to classification with KNN. It seems conventional wisdom that with a higher $k$ we get a higher bias. But applying the above definitions I am not convinced this should always be true.