Hello, folks! Good to see you all again! 🙂 Today, we will be focusing on an Important Error Metric – Recall in Python.
Let us begin!
Table of Contents
First, what is an Error Metric?
In the domain of data science and Machine Learning, where we are required to implement the model for predictions and real-life problems, it is very important for us to understand the effect of every model or algorithm on the data values.
Now, the question arises that How are we going to check for the effect of every model on our data?
This is when Error metrics comes into picture. Error metrics are the different aspects through which we can check for the accuracy and closeness of the model for the data values.
There are various error metrics for regression as well as classification model. Some of which includes,
Today, we will be focusing on Recall in Python as the error metric!
Recall in Python as an Error Metric!
“Recall” is a Classification error metric. It evaluates the outcome of the classification algorithms for which the target/response value is a category.
Basically, Recall in Python defines the amount of the values that are predicted rightly and are actually correctly labelled. By this, we mean to say, it represents the percentage of values that were actually rightly labelled and are now predicted correctly as well.
Let us try to understand this with the help of an example! Consider a variable ‘Pole’ with values ‘True, False’. Now, the job of Recall error metric would be to find out how well the model works in the below scenario which is, how many values which were labelled as True and actually predicted as True samples.
So, technically speaking, recall is the error metric that accounts for the ability of the Classifies to predict the positive labelled samples correctly.
Recall = True Positive/ (True Positive + False Negative)
Let us now implement the concept of Recall with various examples in the below section.
1. Recall with Decision Trees
Let us begin with importing the dataset! We have used Bike Prediction dataset and have imported it using pandas.read_csv() function.
You can find the dataset here.
Loading the dataset
import pandas BIKE = pandas.read_csv("Bike.csv")
Splitting the dataset
We have segregated the dataset into training and testing dataset using train_test_split() function.
#Separating the depenedent and independent data variables into two dataframes. from sklearn.model_selection import train_test_split X = bike.drop(['cnt'],axis=1) Y = bike['cnt'] # Splitting the dataset into 80% training data and 20% testing data. X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)
Now, time to define the Error Metrics!
We have created a customized function ‘err_metric’ and have calculated the precision, recall, accuracy and f1 score as shown below–
# Error metrics -- Confusion matrix\FPR\FNR\f1 score\ def err_metric(CM): TN = CM.iloc[0,0] FN = CM.iloc[1,0] TP = CM.iloc[1,1] FP = CM.iloc[0,1] precision =(TP)/(TP+FP) accuracy_model =(TP+TN)/(TP+TN+FP+FN) recall_score =(TP)/(TP+FN) specificity_value =(TN)/(TN + FP) False_positive_rate =(FP)/(FP+TN) False_negative_rate =(FN)/(FN+TP) f1_score =2*(( precision * recall_score)/( precision + recall_score)) print("Precision value of the model: ",precision) print("Accuracy of the model: ",accuracy_model) print("Recall value of the model: ",recall_score) print("Specificity of the model: ",specificity_value) print("False Positive rate of the model: ",False_positive_rate) print("False Negative rate of the model: ",False_negative_rate) print("f1 score of the model: ",f1_score)
Implementing the model!
Let us now apply the Decision Tree model on our dataset. We have used
DecisionTreeClassfier() method to apply it on our data.
#Decision Trees decision = DecisionTreeClassifier(max_depth= 6,class_weight='balanced' ,random_state =0).fit(X_train,Y_train) target = decision.predict(X_test) targetclass_prob = decision.predict_proba(X_test)[:, 1] confusion_matrix = pd.crosstab(Y_test,target) err_metric(confusion_matrix)
As seen below, we get the value of Recall as 0.57 i.e. 57% which means 57% of the data that is actually correctly labelled is predicted rightly.
Precision value of the model: 0.25 Accuracy of the model: 0.6028368794326241 Recall value of the model: 0.5769230769230769 Specificity of the model: 0.6086956521739131 False Positive rate of the model: 0.391304347826087 False Negative rate of the model: 0.4230769230769231 f1 score of the model: 0.3488372093023256
2. Recall in Python using sklearn library
Python sklearn offers us with
recall_score() method that depicts the recall value for a set of data values.
recall_score(x, y, average='weighted')
- x: Actual values
- y: Predicted set of values
- average: string, [None, ‘binary’ (default), ‘micro’, ‘macro’, ‘samples’, ‘weighted’]
In the below example, x refers to the actual set of values while y represents the predicted values.
from sklearn.metrics import recall_score x = [10,20,30,40,50,60] y = [10,21,30,40,50,80] print("Recall value:") recall_score(x, y, average='weighted')
Recall value: 0.6666666666666666
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For a deeper understanding, try executing the concept of recall with various datasets and do let us know your experience in the comment box!
Till then, Stay tuned!
See you in the next article! Enjoy Learning with JournalDev 🙂