Hey, readers! Today, we will be focusing on an important error metric for Classification Algorithms — F1 Score in Python. So, let us begin!
What is F1 score?
F1 score is a Classification error metric that like any other error metric helps us evaluate the performance of an algorithm. It helps us evaluate the performance of the machine learning model in terms of binary classification.
It is a combination of
recall metrics and is termed as the harmonic mean of precision and recall. It is basically used in cases when the data is imbalanced or there is a binary classification in the dataset.
Have a look at the below formula–
F1 = 2 * (precision * recall) / (precision + recall)
F1 score increases as the precision and recall value rises for a model.
A high score indicates that the model is well versed in terms of handling the class imbalance problem.
Let us now focus on the practical implementation of the same in the upcoming section.
Applying F1 Score on Loan Dataset
Here, we would be implementing the evaluation metrics on Loan Defaulter Prediction. You can find the dataset here.
1. Load the dataset
We have used pandas.read_csv() function to load the dataset into the environment.
import pandas as pd import numpy as np loan = pd.read_csv("Bank-loan.csv")
2. Split the dataset
Further, we have splitted the dataset using train_test_split() function as shown–
from sklearn.model_selection import train_test_split X = loan.drop(['default'],axis=1) Y = loan['default'].astype(str) X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)
3. Defining the error metrics
Here, we have defined the
confusion matrix and other error metrics using customized functions.
# Error metrics -- Confusion matrix\FPR\FNR\f1 score\ def err_metric(CM): TN = CM.iloc[0,0] FN = CM.iloc[1,0] TP = CM.iloc[1,1] FP = CM.iloc[0,1] precision =(TP)/(TP+FP) accuracy_model = (TP+TN)/(TP+TN+FP+FN) recall_score = (TP)/(TP+FN) f1_score = 2*(( precision * recall_score)/( precision + recall_score)) print("f1 score of the model: ",f1_score)
We have applied Decision Tree algorithm on the dataset as shown below–
#Decision Trees decision = DecisionTreeClassifier(max_depth= 6,class_weight='balanced' ,random_state =0).fit(X_train,Y_train) target = decision.predict(X_test) targetclass_prob = decision.predict_proba(X_test)[:, 1]
5. Evaluation of the model
Now, having applied the model, now we have evaluated the model with the metrics defined in the above section.
confusion_matrix = pd.crosstab(Y_test,target) err_metric(confusion_matrix)
f1 score of the model: 0.3488372093023256
F1 Score with sklearn library
In this example, we have used the built-in function from
sklearn library to calculate the f1 score of the data values. The
f1_score() method is used to calculate the score value without having to explicitly make use of the precision and recall values.
from sklearn.metrics import f1_score x = [0, 1, 20 ,30, 40] y = [1, 19, 20, 30, 39] res=f1_score(x, y, average='macro') print("F1 score:", res)
F1 score: 0.2857142857142857
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
Till then, Stay tuned and Keep Learning!! 🙂