Jaccard Similarity and Distance in Python

Filed Under: Python
Jaccard Python FeaImg

In this tutorial, we will explore how to calculate the Jaccard similarity and Jaccard distance in Python. Let us start off by understanding what the two terms mean and how do we compute them.


What is Jaccard Similarity and Distance?

Jaccard Similarity is a popular proximity measurement that determines the similarity of two items, such as two text texts. If we have two sets A and B, the formula below helps to compute the similarity (or index) between the two sets:

Jaccard Similarity Formula
Jaccard Similarity Formula

The Jaccard distance, as opposed to the Jaccard similarity (Jaccard index), is a measure of dissimilarity between two sets. The distance is calculated mathematically as the ratio of the difference between set union and set intersection over the set union. Then their distance is calculated as follows:

Jaccard Distance Formula
Jaccard Distance Formula

Code Implementation in Python

Now that we know what both the terms mean and we also have the formulas for both the similarity index and distance. We can move to code implementation for both using the Python programming language.

Take User Input for both the sets

We will make sure the user has control over the input and they enter the values for the two sets. The same happens using the code below.

S1 = set(map(int,input("Enter elements of set 1: ").split()))
S2 = set(map(int,input("Enter elements of set 2: ").split()))
print("The two sets are : \n",S1,"\n",S2)

Computing the Jaccard Similarity and Distance

As the next step we will construct a function that takes both the input sets as parameters and then computes the similarity and distance using set operations and returns both the values:

def jaccard_similarity_n_distance(A, B):
    # Compute Jaccard Similarity
    nominator = A.intersection(B)
    denominator = A.union(B)
    Jacc_similarity = len(nominator)/len(denominator)
    
    # Compute Jaccard Distance
    nominator = A.symmetric_difference(B)
    denominator = A.union(B)
    Jacc_distance = len(nominator)/len(denominator)
    
    return (Jacc_similarity,Jacc_distance)

Result = jaccard_similarity_n_distance(S1,S2)
print("Jaccard Similarity : ",Result[0])
print("Jaccard Distance : ",Result[1])

The Complete Code for Jaccard Similarity and Distance

S1 = set(map(int,input("Enter elements of set 1: ").split()))
S2 = set(map(int,input("Enter elements of set 2: ").split()))
print("The two sets are : \n",S1,"\n",S2)

def jaccard_similarity_n_distance(A, B):
    # Compute Jaccard Similarity
    nominator = A.intersection(B)
    denominator = A.union(B)
    Jacc_similarity = len(nominator)/len(denominator)
    
    # Compute Jaccard Distance
    nominator = A.symmetric_difference(B)
    denominator = A.union(B)
    Jacc_distance = len(nominator)/len(denominator)
    
    return (Jacc_similarity,Jacc_distance)

Result = jaccard_similarity_n_distance(S1,S2)
print()
print("Jaccard Similarity : ",Result[0])
print("Jaccard Distance : ",Result[1])

Some Sample Outputs

Now that the code implementation is complete, we will look at some sample outputs below.

Enter elements of set 1: 3 5 2 1
Enter elements of set 2: 5 3 2 6
The two sets are : 
 {1, 2, 3, 5} 
 {2, 3, 5, 6}

Jaccard Similarity :  0.6
Jaccard Distance :  0.4
Enter elements of set 1: 5 3 4 7
Enter elements of set 2: 6 3 1 6
The two sets are : 
 {3, 4, 5, 7} 
 {1, 3, 6}

Jaccard Similarity :  0.16666666666666666
Jaccard Distance :  0.8333333333333334

Conclusion

We looked at Jaccard similarity (index) and Jaccard distance, as well as how to compute them in Python. If you have any questions or recommendations, please post them in the comments section below.

Thank you for reading!

I recommend you to read the following tutorials as well:

  1. Geocodes in Python for Distance Measuring
  2. K-Nearest Neighbors (KNN) in Python
  3. Convert Kilometers to Miles using Python

close
Generic selectors
Exact matches only
Search in title
Search in content