Cohan's Kappa different class sizes

Question

For my research I need to look at the Cohan's Kappa measure to find out if there is agreement. I asked four people to cluster 38 words and I got the following results:

Person 1: 6 clusters
Person 2: 7 clusters
Person 3: 11 clusters
Person 4: 9 clusters

In the example given by Landis and Koch (1977) they have fixed classes and compare two observer to each other. Can I also apply this at this example, because I have the following questions:

I don't know what is the true cluster where the words belong to. I can look at overlap and say if the overlap between two clusters is the highest these two should be the same.
There is a different in cluster count, when comparing person 1 to person 2 then person 2 has one cluster more. Should I then add a empty cluster to person 1 result to make the cluster count equal?
Can I compare all four person at ones or should I compare 2 person in one test? Thus, 1->2, 1->3, 1->4. Taking person's 1 results as starting point. Because in the experiment I want to see how other persons would cluster the same words. The words are produced by person 1.

Is it all ok with your kappa analysis? Had you other problems with data arrangement? — Anatoly, Jul 06 '14 at 21:01

Anatoly · Accepted Answer · 2014-07-03T22:00:40.797

Since there are multiple raters and multiple categories/clusters, you have to use the generalized Cohen's kappa (the standard Cohen's kappa is used for two raters and two categories) or the Fleiss kappa.

Most statistical softwares accept two types of data input. The first one is arranged to show rating scores: a table of variable size must be created, where each column represents one rater, each row represents an item (in your case, a word), and the number in each cell is the category (in your case, cluster) assigned to that item by that rater. In this case you would need a 4x38 table.

The second type of data input is arranged as a "table of counts", and is that actually used to calculate Kappa (typically, softwares convert tables of the first type into table of counts before calculating Kappa). In this type, the table is again variable in size, each row represent an item, each column represents a category/cluster, and the number within each cell is the number of times that item was assigned to that category (i. e., the sum of numbers in each row is equal to the number of raters). In this case you would need a 11x38 table.

Regarding the issues raised in the question:

knowing the "true" category/cluster is not necessary, when performing the kappa test you are interested only in concordance among raters and not in who is right or wrong;
the different category/cluster count between raters is not a problem. In the first type of data input you have simply to write zero in all cells representing clusters not used by a given rater. In the second type of data input, simply write in each cell the number of raters that assigned that word to that cluster;
the typical concordance Kappa test is usually performed across all raters, not using a pairwise pattern.

Thank you for your answer. Sorry for the delayed reply, got a lot of stuff on my plate to finish my thesis. I have looked at Cohans Kappa and multiple raters and found an article by Davies and Fleiss (1982) they state that you can perform Cohans Kappa for each rater combination and take the average of this kappa. The Fleis Kappa suits my research more, because it's made for multiple raters. I could not find a working scripts for SPSS, thus at the moment I'm calculating it by hand. I want to compare the result from the Fleiss Kappa to that of the average to see how they compare. — Pakspul, Jul 10 '14 at 09:59

Cohan's Kappa different class sizes

1 Answers1