A machine learning model gets an accuracy of 90% on a dataset with 90% positive class and 10% negative class. Can we conclude that the model is a good classifier of the data?
-
1What do you think? – Arthur Aug 27 '19 at 14:06
-
I think its pretty accurate. But it was an T/F question in my homework. – emily Aug 27 '19 at 14:09
4 Answers
If it is an classification problem to choose whether the sample is in the positive class or not, the classifier of accuracy 90% is not quite good. Consider the trivial classifier, declaring any sample to be positive, which provides 90% accuracy.
I think that imbalanced dataset is the appropriate keyword.
- 339
Hint. The question behind the question is this:
Does there exist a completely useless classifier that still acheives a $90\%$ accuracy?
If there is such a classifier, then we can't conclude from the given information that our classifier is any good. If there is no such classifier, then our classifier must at least be doing something right.
- 199,419
Consider an example that we have a dataset that which has 90 examples of class A(say positive class) and 10 examples of class B(say negative class). Then we can make a "dumb" model that always say Class A, as prediction on training data, then we get the accuracy of 90% on training data which is naive and "dumb" prediction, thus accuracy of 90% in a data set containing 90% class A is not that good. So our model is not doing a great thing, and hence model is not a good classifier.
I hope that helps.
- 11
A good measurement for this is the confusion matrix. https://en.wikipedia.org/wiki/Confusion_matrix
If you have to test to predict a rare disease with people and only 1% has this disease. If a classifier would predict everyone NOT to have the disease, it would be 99% accurate. A common misunderstanding in statistics.
- 1,419