I am trying to understand if classifier X has smaller training error than classifier Y, then classifier X will have smaller generalization (test) error than classifier Y. ( Answer is False)
My understanding is below:
They can be totally different(training error and test error) if train distribution is not equal to test distribution. Even under the same distribution, they can be very different. Because h is picked to minimize training error, not test error. Let me know if my understanding is correct ?