Hey, folks! In this article, we will learn how to calculate precision in Python which is a Classification Error Metric.
So, let us begin!
Table of Contents
What is Precision?
Let us understand the need for Error Metrics in Classification or Regression Algorithms.
Error Metrics helps us analyze the accuracy of a particular machine learning model over a dataset or set of data values. There are different error metrics for different types of machine learning algorithms.
Error metrics for Regression data–
- Mean Square Error
- Root Mean Square Error
- R square
- Adjusted R square, etc
Error metrics for Classification —
- Confusion Matrix
- f1 Score, etc
Precision identifies the correctly classified positive labels from the classified data values.
With Precision, we tend to measure the positive labels that are predicted correctly and are actually correct!
Have a look at the below formula–
Precision = True Positives / (True Positives + False Positives)
Here, the True Positive and False Positive values can be calculated through the Confusion Matrix. The value of Precision ranges between 0.0 to 1.0 respectively.
By True positive, we mean the values which are predicted as positive and are actually positive. While False Positive values are the values which are predicted as positive but are actually negative.
Let us now implement this in the upcoming section through an example.
Implementing Precision with a Classification Algorithm
We have tried implementing Precision as a measure with Decision Tree Algorithms.
Let us start implementing the same!!
In this example, we have used Bank Loan Defaulter dataset. This problem refers to the prediction of the loan defaulters from the bank’s dataset.
1. Load the dataset
import pandas as pd import numpy as np loan = pd.read_csv("bank-loan.csv") # dataset
2. Splitting the dataset
Splitting of the dataset into training and testing set is performed using
train_test_split() function as shown below–
from sklearn.model_selection import train_test_split X = loan.drop(['default'],axis=1) Y = loan['default'].astype(str)
3. Defining Error Metrics
We have defined the Confusion Matrix and Precision calculation to be used for the evaluation of the model.
# Error metrics -- Confusion matrix\FPR\FNR\f1 score\ def err_metric(CM): TN = CM.iloc[0,0] FN = CM.iloc[1,0] TP = CM.iloc[1,1] FP = CM.iloc[0,1] precision =(TP)/(TP+FP) accuracy_model =(TP+TN)/(TP+TN+FP+FN) recall_score =(TP)/(TP+FN) specificity_value =(TN)/(TN + FP) False_positive_rate =(FP)/(FP+TN) False_negative_rate =(FN)/(FN+TP) f1_score =2*(( precision * recall_score)/( precision + recall_score)) print("Precision value of the model: ",precision) print("Accuracy of the model: ",accuracy_model)
We have applied the Decision Tree Algorithm to identify the loan defaulters from the data.
#Decision Trees decision = DecisionTreeClassifier(max_depth= 6,class_weight='balanced' ,random_state =0).fit(X_train,Y_train) target = decision.predict(X_test) targetclass_prob = decision.predict_proba(X_test)[:, 1]
5. Evaluation of model
Finally, we have evaluated the model by calling the defined confusion matrix and precision format.
confusion_matrix = pd.crosstab(Y_test,target) err_metric(confusion_matrix)
Precision value of the model: 0.25 Accuracy of the model: 0.6028368794326241
So, the output states that 25% of the values predicted as positive by the model are actually positive.
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question. Till then, Happy Learning! 🙂