Recall using R Programming

Filed Under: R Programming
Recall Error Metric

Hello readers! In our series of Machine Learning with R, today, we will be having a look at Recall using R programming, in detail!

So, let us begin!!


First, what is Recall in R ?

In the domain of data science and Machine Learning, error metrics play a very important role. Error metrics help in evaluating the models in terms of the outcome i.e. accuracy of the model.

Machine Learning comprises a variety of algorithms but is broadly classified into Classification and Regression Algorithms.

In our last section, we have understood the functioning of Precision as an important error metric. Today, we would be unveiling another important error metric – Recall using R programming.

Recall” is a Classification Error Metric. With recall, we can evaluate the correctness of the predictions stated by a model as well as validate the same.

Recall is the measure of the values that are actually correctly labeled and have been rightly classified in the predictions made by the algorithm.

To frame it more precisely, the role of Recall is to calculate the percentage of values that are rightly labeled and have been predicted correctly as well.

Let us consider an example of the same.

Consider a survey wherein we want to predict the type of domestic animals in the nearby area. For the same, we built the model and predict whether the stray animal is a cat or dog.

So, with Recall, we get the percentage of values that are actually cat or dogs and have been rightly labeled as either cat or dog.

Have a look at the below formula!

Recall = True Positive/ (True Positive + False Negative)

In technical terms, Recall is the measure of the values that are actually labelled as positive and are predicted as positive itself.


Calculating Recall for Na茂ve Bayes

In this example, we would be making use of Bank Loan Defaulter dataset to predict whether a customer is a loan defaulter or not.

You can find the dataset below!

Defaulter Prediction Dataset
Defaulter Prediction Dataset
  1. First, we load the dataset into the R environment using read.csv() function.
#Removed all the existing objects
rm(list = ls())
#Setting the working directory
setwd("Santander Prediction/")
getwd()

#Load the dataset
train_data = read.csv("train.csv",header=TRUE)

2. Prior to modeling, we need to split the dataset into training and testing data. We have made use of createDataPartition() to split the dataset.

###SAMPLING OF DATA###
library(caret)
clean_data = cbind(train_independent,train_dependent)
split_index =createDataPartition(clean_data$train_dependent , p=.80 ,list=FALSE)
X = clean_data[split_index,]
Y  = clean_data[-split_index,]

3. Having split the data, we now create a customized Confusion Matrix function to calculate the recall and accuracy according to the formula.

#error metrics -- Confusion Matrix
error_metric=function(CM)
{
  TN =CM[1,1]
  TP =CM[2,2]
  FP =CM[1,2]
  FN =CM[2,1]
  recall_score =(FP)/(FP+TN)
  accuracy_model  =(TP+TN)/(TP+TN+FP+FN)
  print(paste("Accuracy of the model: ",round(accuracy_model,2)))
  print(paste("Recall value of the model: ",round(recall_score,2))) 
}

4. Now is the time to apply our model on the dataset. We have applied Na茂ve Bayes model using library e1071 from R documentation.

We have converted all the data variables into category data type values using factor() method.

Further, we have used naiveBayes() method to apply the model to the training data values.

Having applied the model to the training data, we make use of predict() method to predict the values for the testing dataset.

library(e1071)
X$train_dependent = factor(X$train_dependent ,levels = c(0,1))
# train model 
naive_model  =naiveBayes(train_dependent~.  , data =X )  
naive_predict = predict(naive_model , Y[-201])

5. Finally, we create a data matrix for the values obtained from the confusion matrix (customized function) using table() method.

CM_naive= table(Y[,201] , naive_predict)
error_metric(CM_naive)

Output:

"Accuracy of the model:  0.92"
"Recall value of the model:  0.65"

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question. For more such posts related to R, stay tuned and till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content