Hello readers! In our series of Machine Learning with R programming, today we will be having a look at an important error metric in ML – **Precision** in R in detail.

So, let us begin!! ðŸ™‚

## First, what is Precision?

Before diving deep into the concept of Precision using R, let us understand the importance of error metrics.

Error metrics enables us to evaluate the performance of every Machine Learning algorithm which is being used to make predictions on any real life problem.

There are various types of error metrics for Classification and Regression Algorithms in Machine Learning.

Precision is one such Classification error metric model. It can be used for all the classification Supervised Machine Learning models to evaluate the effect of the model on the testing datasets.

`Precision = True Positives / (True Positives + False Positives)`

Precision is the measure of the positive labels that get correctly identified as positive and are actually positive in the dataset. Thus, it classifies the correct positive labels from the data values.

**Precision value ranges between 0.0 to 1.0 only**. As seen in the above formula, we obtain the precision value by the division of True positive values with the total classified values present.

Having understood the concept of Precision, let us now implement the same in the upcoming section!

## Calculating Precision in R for Logistic Regression

Let us now start applying the Precision error metric on the Logistic Machine Learning model.

In this example, we would be using Bank Loan Defaulter prediction wherein our task is to predict whether a customer is a loan defaulter or not. You can find the dataset here.

So, let us begin!

### 1. Load the dataset

At first, we need to load the dataset into the R environment. We have used read.csv() function to load the dataset into the environment in the R studio.

```
#Removed all the existing objects
rm(list = ls())
#Setting the working directory
setwd("Santander Prediction/")
getwd()
#Load the dataset
train_data = read.csv("train.csv",header=TRUE)
test_data = read.csv("test.csv",header=TRUE)
```

### 2. Splitting of the dataset

Having loaded the data, let us now segregate the dataset into two halves: X and Y.

We have used `createDataPartition()`

function to segregate the data into 80% training and 20% testing data values.

You can find this function in caret library of R documentation.

```
####SAMPLING OF DATA####
library(caret)
clean_data = cbind(train_independent,train_dependent)
split_index =createDataPartition(clean_data$train_dependent , p=.80 ,list=FALSE)
X = clean_data[split_index,]
Y = clean_data[-split_index,]
```

### 3. Error metrics

Now is the time to create functions to evaluate the model. Here, we have used Precision as the mode of evaluation.

At first, we have created a **confusion matrix** using a customized function. To add, we have calculated the accuracy of the model.

```
#error metrics -- Confusion Matrix
error_metric=function(CM)
{
TN =CM[1,1]
TP =CM[2,2]
FP =CM[1,2]
FN =CM[2,1]
precision =(TP)/(TP+FP)
accuracy_model =(TP+TN)/(TP+TN+FP+FN)
print(paste("Precision value of the model: ",round(precision,2)))
print(paste("Accuracy of the model: ",round(accuracy_model,2)))
}
```

### 4. Modelling

Finally, it is time to apply our model to the split datasets. We have made use of `glm() function`

from R to apply Logistic Regression on the data.

Further, the `predict() function`

is used to predict the built model on the testing data.

```
logit_model =glm(formula = train_dependent~. ,data =X ,family='binomial')
summary(logit_model)
logit_predict = predict(logit_model , Y[-201] ,type = 'response' )
logit_predict <- ifelse(logit_predict > 0.5,1,0) # Probability check
CM= table(Y[,201] , logit_predict)
error_metric(CM)
```

Further, we have created the table-matrix for evaluation using `table()`

function and at last called the user-defined function to get the precision value for the model.

**Output:**

```
"Precision value of the model: 0.71"
"Accuracy of the model: 0.91"
```

## Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question. For more posts related to Machine Learning with R, Stay tuned!!

Do let us know your experience with the Precision implementation in the comment section.

Till then, Happy Learning!! ðŸ™‚

Hi..I’m quite new with R. May I know the number 201 in the code CM= table(Y[,201] , logit_predict) refer to what? Thank you.