F1 Score – Classification Error Metric

Filed Under: Python Advanced
F1 Score

Hey, readers! Today, we will be focusing on an important error metric for Classification Algorithms — F1 Score in Python. So, let us begin!


What is F1 score?

F1 score is a Classification error metric that like any other error metric helps us evaluate the performance of an algorithm. It helps us evaluate the performance of the machine learning model in terms of binary classification.

It is a combination of precision and recall metrics and is termed as the harmonic mean of precision and recall. It is basically used in cases when the data is imbalanced or there is a binary classification in the dataset.

Have a look at the below formula–

F1 = 2 * (precision * recall) / (precision + recall)

F1 score increases as the precision and recall value rises for a model.

A high score indicates that the model is well versed in terms of handling the class imbalance problem.

Let us now focus on the practical implementation of the same in the upcoming section.


Applying F1 Score on Loan Dataset

Here, we would be implementing the evaluation metrics on Loan Defaulter Prediction. You can find the dataset here.

1. Load the dataset

We have used pandas.read_csv() function to load the dataset into the environment.

import pandas as pd
import numpy as np
loan = pd.read_csv("Bank-loan.csv")

2. Split the dataset

Further, we have splitted the dataset using train_test_split() function as shown–

from sklearn.model_selection import train_test_split
X = loan.drop(['default'],axis=1) 
Y = loan['default'].astype(str)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)

3. Defining the error metrics

Here, we have defined the confusion matrix and other error metrics using customized functions.

# Error metrics -- Confusion matrix\FPR\FNR\f1 score\
def err_metric(CM): 
    
    TN = CM.iloc[0,0]
    FN = CM.iloc[1,0]
    TP = CM.iloc[1,1]
    FP = CM.iloc[0,1]
    precision =(TP)/(TP+FP)
    accuracy_model = (TP+TN)/(TP+TN+FP+FN)
    recall_score = (TP)/(TP+FN)
    f1_score = 2*(( precision * recall_score)/( precision + recall_score))
    print("f1 score of the model: ",f1_score)    

4. Modelling

We have applied Decision Tree algorithm on the dataset as shown below–

#Decision Trees
decision = DecisionTreeClassifier(max_depth= 6,class_weight='balanced' ,random_state =0).fit(X_train,Y_train)
target = decision.predict(X_test)
targetclass_prob = decision.predict_proba(X_test)[:, 1]

5. Evaluation of the model

Now, having applied the model, now we have evaluated the model with the metrics defined in the above section.

confusion_matrix = pd.crosstab(Y_test,target)
err_metric(confusion_matrix)

Output:

f1 score of the model:  0.3488372093023256

F1 Score with sklearn library

In this example, we have used the built-in function from sklearn library to calculate the f1 score of the data values. The f1_score() method is used to calculate the score value without having to explicitly make use of the precision and recall values.

from sklearn.metrics import f1_score
x = [0, 1, 20 ,30, 40]
y = [1, 19, 20, 30, 39]
res=f1_score(x, y, average='macro')
print("F1 score:", res)

Output:

F1 score: 0.2857142857142857

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

Till then, Stay tuned and Keep Learning!! 馃檪


References

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages