分享

Jaccard’s Coefficient

 石头狗 2009-01-21

Jaccard's coefficient (measure similarity) and Jaccard's distance (measure dissimilarity) are measurement of asymmetric information on binary (and non-binary) variables. Compare Jaccard's coefficient with Simple matching coefficient.

For some applications, the existence of in Simple Matching makes no sense because it represents double absence. This may happen when the value of positive and negative do not have equal information (asymmetry). For example, if the negative value is not important, counting the non-existence in both objects may have no meaningful contribution to the similarity or dissimilarity. Jaccard's coefficient remove the from simple matching coefficient to become

Formula

Where

= number of variables that positive for both objects

= number of variables that positive for the th objects and negative for the th object

= number of variables that negative for the th objects and positive for the th object

= number of variables that negative for both objects

= total number of variables

 

Jaccard's distance can be obtained from

Thus,

原文在这个地方有一个javascript小程序,演示。

Example 1
:

Feature of Fruit

Sphere shape

Sweet

Sour

Crunchy

Object =Apple

Yes

Yes

Yes

Yes

Object =Banana

No

Yes

No

No

The coordinate of Apple is (1,1,1,1) and coordinate of Banana is (0,1,0,0). Because each object is represented by 4 variables, we say that these objects has 4 dimensions. , and , .

Jaccard's coefficient between Apple and Banana is 1/4 . Jaccard's distance between Apple and Banana is 3/4.

 

For non binary data, Jaccard's coefficient can also be computed using set relations

Example 2

Suppose we have two sets and .

Then the union is and the intersection between two sets is . Jaccard's coefficient can be computed based on the number of elements in the intersection set divided by the number of elements in the union set

 

Of course, the set formula is also work for binary data, but we need to compute each digit using Boolean algebra. (A and B is True if both true, A or B is false if both False). Intersection set is equivalent to AND, while Union operation is equivalent to OR.

Example 3

Let us use the example above

A

1

1

1

1

B

0

1

0

0

A and B

0

1

0

0

A or B

1

1

1

1

Sum of all digits can be used to compute Jaccard's coefficient

the same result as example 1 above.

<Content | Previous | Next >

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多