Machine learning model accuracy

Question

A machine learning model gets an accuracy of 90% on a dataset with 90% positive class and 10% negative class. Can we conclude that the model is a good classifier of the data?

I think its pretty accurate. But it was an T/F question in my homework. — emily, Aug 27 '19 at 14:09

score 2 · Answer 1 · answered Aug 27 '19 at 14:15

2

If it is an classification problem to choose whether the sample is in the positive class or not, the classifier of accuracy 90% is not quite good. Consider the trivial classifier, declaring any sample to be positive, which provides 90% accuracy.

I think that imbalanced dataset is the appropriate keyword.

answered Aug 27 '19 at 14:15

Heedong Do

339

1

That makes a lot of sense. Thanks. – emily Aug 27 '19 at 14:40

score 2 · Answer 2 · answered Aug 27 '19 at 14:15

Hint. The question behind the question is this:

Does there exist a completely useless classifier that still acheives a $90\%$ accuracy?

If there is such a classifier, then we can't conclude from the given information that our classifier is any good. If there is no such classifier, then our classifier must at least be doing something right.

score 1 · Answer 3 · answered Sep 01 '19 at 12:09

Consider an example that we have a dataset that which has 90 examples of class A(say positive class) and 10 examples of class B(say negative class). Then we can make a "dumb" model that always say Class A, as prediction on training data, then we get the accuracy of 90% on training data which is naive and "dumb" prediction, thus accuracy of 90% in a data set containing 90% class A is not that good. So our model is not doing a great thing, and hence model is not a good classifier.

I hope that helps.

score 1 · Answer 4 · answered Sep 10 '19 at 13:36

A good measurement for this is the confusion matrix. https://en.wikipedia.org/wiki/Confusion_matrix

If you have to test to predict a rare disease with people and only 1% has this disease. If a classifier would predict everyone NOT to have the disease, it would be 99% accurate. A common misunderstanding in statistics.

Machine learning model accuracy

4 Answers4