0

A linear regression model has been created based on a dataset with observations which can be sorted into several different categories.

I have been asked to assess how well this regression model (which was created using the entire dataset) fits the observations by the category subsets. Correct me if needed, but this seems to render R-squared useless for multiple reasons - (1) because the means of predictions and observations inside a specific category are not equal, and the relationship that SSE + SSR = SSTotal no longer holds within each category.

Calculating the correlation coefficient between the overall model's fits for the category, and those category's observations, would be equivalent to fitting a new least-squares model specific to that category and calculating an R-squared, but that would not be assessing the fit of the OVERALL model to that category.

What other options would you recommend here? Would some kind of likelihood ratio test perhaps be appropriate? I confess I'm not sure what else to do.

Thanks for your help.

Greg
  • 131
  • Can you elaborate a little more on your data set and the fitted? If you can upload a sample of the data and write down the (fitted) model - it can help to identify the problem. – V. Vancak Jul 18 '18 at 18:10
  • I'm sorry, but I can't due to company policy. I also do not think that it would be necessary to view the dataset to answer this question, seeing how general it is. – Greg Jul 31 '18 at 00:32

0 Answers0