Recall in Python – An IMP Error Metric to know!

Filed Under: Python Advanced
RECALL Error Metric

Hello, folks! Good to see you all again! 🙂 Today, we will be focusing on an Important Error Metric – Recall in Python.

Let us begin!


First, what is an Error Metric?

In the domain of data science and Machine Learning, where we are required to implement the model for predictions and real-life problems, it is very important for us to understand the effect of every model or algorithm on the data values.

Now, the question arises that How are we going to check for the effect of every model on our data?

This is when Error metrics comes into picture. Error metrics are the different aspects through which we can check for the accuracy and closeness of the model for the data values.

There are various error metrics for regression as well as classification model. Some of which includes,

Today, we will be focusing on Recall in Python as the error metric!


Recall in Python as an Error Metric!

Recall” is a Classification error metric. It evaluates the outcome of the classification algorithms for which the target/response value is a category.

Basically, Recall in Python defines the amount of the values that are predicted rightly and are actually correctly labelled. By this, we mean to say, it represents the percentage of values that were actually rightly labelled and are now predicted correctly as well.

Let us try to understand this with the help of an example! Consider a variable ‘Pole’ with values ‘True, False’. Now, the job of Recall error metric would be to find out how well the model works in the below scenario which is, how many values which were labelled as True and actually predicted as True samples.

So, technically speaking, recall is the error metric that accounts for the ability of the Classifies to predict the positive labelled samples correctly.

Recall = True Positive/ (True Positive + False Negative)

Let us now implement the concept of Recall with various examples in the below section.


1. Recall with Decision Trees

Let us begin with importing the dataset! We have used Bike Prediction dataset and have imported it using pandas.read_csv() function.

You can find the dataset here.

Loading the dataset

import pandas
BIKE = pandas.read_csv("Bike.csv")

Splitting the dataset

We have segregated the dataset into training and testing dataset using train_test_split() function.

#Separating the depenedent and independent data variables into two dataframes.
from sklearn.model_selection import train_test_split 
X = bike.drop(['cnt'],axis=1) 
Y = bike['cnt']
# Splitting the dataset into 80% training data and 20% testing data.
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)

Now, time to define the Error Metrics!

We have created a customized function ‘err_metric’ and have calculated the precision, recall, accuracy and f1 score as shown below–

# Error metrics -- Confusion matrix\FPR\FNR\f1 score\
def err_metric(CM): 
    
    TN = CM.iloc[0,0]
    FN = CM.iloc[1,0]
    TP = CM.iloc[1,1]
    FP = CM.iloc[0,1]
    precision =(TP)/(TP+FP)
    accuracy_model  =(TP+TN)/(TP+TN+FP+FN)
    recall_score  =(TP)/(TP+FN)
    specificity_value =(TN)/(TN + FP)
    
    False_positive_rate =(FP)/(FP+TN)
    False_negative_rate =(FN)/(FN+TP)
    f1_score =2*(( precision * recall_score)/( precision + recall_score))
    print("Precision value of the model: ",precision)
    print("Accuracy of the model: ",accuracy_model)
    print("Recall value of the model: ",recall_score)
    print("Specificity of the model: ",specificity_value)
    print("False Positive rate of the model: ",False_positive_rate)
    print("False Negative rate of the model: ",False_negative_rate)
    print("f1 score of the model: ",f1_score)

Implementing the model!

Let us now apply the Decision Tree model on our dataset. We have used DecisionTreeClassfier() method to apply it on our data.

#Decision Trees
decision = DecisionTreeClassifier(max_depth= 6,class_weight='balanced' ,random_state =0).fit(X_train,Y_train)
target = decision.predict(X_test)
targetclass_prob = decision.predict_proba(X_test)[:, 1]
confusion_matrix = pd.crosstab(Y_test,target)
err_metric(confusion_matrix)

Output:

As seen below, we get the value of Recall as 0.57 i.e. 57% which means 57% of the data that is actually correctly labelled is predicted rightly.

Precision value of the model:  0.25
Accuracy of the model:  0.6028368794326241
Recall value of the model:  0.5769230769230769
Specificity of the model:  0.6086956521739131
False Positive rate of the model:  0.391304347826087
False Negative rate of the model:  0.4230769230769231
f1 score of the model:  0.3488372093023256

2. Recall in Python using sklearn library

Python sklearn offers us with recall_score() method that depicts the recall value for a set of data values.

Syntax:

recall_score(x, y, average='weighted')
  • x: Actual values
  • y: Predicted set of values
  • average: string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]

In the below example, x refers to the actual set of values while y represents the predicted values.

from sklearn.metrics import recall_score
x = [10,20,30,40,50,60]
y = [10,21,30,40,50,80]
print("Recall value:")
recall_score(x, y, average='weighted')

Output:

Recall value:
0.6666666666666666

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For a deeper understanding, try executing the concept of recall with various datasets and do let us know your experience in the comment box!

Till then, Stay tuned!

See you in the next article! Enjoy Learning with JournalDev 🙂


References

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages