How to Calculate Precision? – Classification Error Metric

Filed Under: Machine Learning

Hey, folks! In this article, we will learn how to calculate precision in Python which is a Classification Error Metric.

So, let us begin!

What is Precision?

Let us understand the need for Error Metrics in Classification or Regression Algorithms.

Error Metrics helps us analyze the accuracy of a particular machine learning model over a dataset or set of data values. There are different error metrics for different types of machine learning algorithms.

Error metrics for Regression data–

  • Mean Square Error
  • Root Mean Square Error
  • R square
  • Adjusted R square, etc

Error metrics for Classification

  • Confusion Matrix
  • Accuracy
  • Precision
  • Recall
  • f1 Score, etc

Precision identifies the correctly classified positive labels from the classified data values.

With Precision, we tend to measure the positive labels that are predicted correctly and are actually correct!

Have a look at the below formula–

Precision = True Positives / (True Positives + False Positives)

Here, the True Positive and False Positive values can be calculated through the Confusion Matrix. The value of Precision ranges between 0.0 to 1.0 respectively.

By True positive, we mean the values which are predicted as positive and are actually positive. While False Positive values are the values which are predicted as positive but are actually negative.

Let us now implement this in the upcoming section through an example.

Implementing Precision with a Classification Algorithm

We have tried implementing Precision as a measure with Decision Tree Algorithms.

Let us start implementing the same!!

In this example, we have used Bank Loan Defaulter dataset. This problem refers to the prediction of the loan defaulters from the bank’s dataset.

1. Load the dataset

Here, we have used Bank Loan Dataset and imported the same into the environment using pandas.read_csv() function.

import pandas as pd
import numpy as np
loan = pd.read_csv("bank-loan.csv") # dataset

2. Splitting the dataset

Splitting of the dataset into training and testing set is performed using train_test_split() function as shown below–

from sklearn.model_selection import train_test_split 
X = loan.drop(['default'],axis=1) 
Y = loan['default'].astype(str)

3. Defining Error Metrics

We have defined the Confusion Matrix and Precision calculation to be used for the evaluation of the model.

# Error metrics -- Confusion matrix\FPR\FNR\f1 score\
def err_metric(CM): 
    TN = CM.iloc[0,0]
    FN = CM.iloc[1,0]
    TP = CM.iloc[1,1]
    FP = CM.iloc[0,1]
    precision =(TP)/(TP+FP)
    accuracy_model  =(TP+TN)/(TP+TN+FP+FN)
    recall_score  =(TP)/(TP+FN)
    specificity_value =(TN)/(TN + FP)
    False_positive_rate =(FP)/(FP+TN)
    False_negative_rate =(FN)/(FN+TP)
    f1_score =2*(( precision * recall_score)/( precision + recall_score))
    print("Precision value of the model: ",precision)
    print("Accuracy of the model: ",accuracy_model)

4. Modelling

We have applied the Decision Tree Algorithm to identify the loan defaulters from the data.

#Decision Trees
decision = DecisionTreeClassifier(max_depth= 6,class_weight='balanced' ,random_state =0).fit(X_train,Y_train)
target = decision.predict(X_test)
targetclass_prob = decision.predict_proba(X_test)[:, 1]

5. Evaluation of model

Finally, we have evaluated the model by calling the defined confusion matrix and precision format.

confusion_matrix = pd.crosstab(Y_test,target)


Precision value of the model:  0.25
Accuracy of the model:  0.6028368794326241

So, the output states that 25% of the values predicted as positive by the model are actually positive.


By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question. Till then, Happy Learning! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages