0

If I have two sets, i can calculate similarity coefficient of them using Jaccard index. Is there algorithm i can calculate similarity with variable number of entities? For example, let's say we have first pair of sets:

{A1,B1,C1} and {A1,B2,C1,D1}

{A1,B1,C1} and {A1,B3,C4,D5}

I can say that first pair is more similar, but how to calculate it mathematically?

1 Answers1

2

The Jaccard index $\displaystyle J(A,B) = {{|A \cap B|}\over{|A \cup B|}} = {{|A \cap B|}\over{|A| + |B| - |A \cap B|}}$ seems to handle differently sized sets as part of its definition

In your examples it would give $\dfrac25$ and $\dfrac16$ and the first value is certainly higher than the second

Henry
  • 157,058