I have a variable in my data which contains discrete values which have no canonical order, e.g. Apple, Orange, Pear.
These values appear with a certain frequency in my base sample. I have a subset of my sample which contains the same variable, and I would like to provide a measure of the similarity of the Fruit variable between the subset and the overall sample.
For continuous variables I use the z-stat and Kolmogorov-Smirnov, and I am looking for something equivalent for my Fruit variable.
I have considered ordering the values in the original sample by their frequency of occurrence and faking a CDF and using K-S, but that feels like a hack. Well, it would be a hack...
I could also invent something that takes a weighted difference of the populations, but I would rather use a conventional statistic if such a thing exists.