Gradient Boosting Model in Python

Filed Under: Python Advanced
Gradient Boosting Model

Hey, readers! In this article, we will be focusing on Gradient Boosting Model in Python.

So, let us begin! 馃檪


Understanding Gradient Boosting Model

Before diving deep into the concept of Gradient Boosting model, let us understand the necessity of these Boosting models in data modelling and predictions.

There are various machine learning algorithms that enable us to perform data modeling and provisioning of the models. Python provisions us with various functions to deal with the data and do the modeling as well.

When we specifically focus on Boosting techniques in machine learning, they enable us to have better classifiers and regressors. That enables us to build a strong model for data modeling and predictions.

In this technique, the model learns and grows from the previous error. That is, the misclassification error of the previous version is fed to the upcoming cycle to learn and grow from the error. This way it introduces variety as well as reduces the error rate.

Grading Boosting follows the concept of Boosting. It is a regression as well as classification machine learning model. Here every single iteration is being fed with the errors of the previous iterations. With this, the Gradient boosting model reduces the final error rate and enables us to have a better model in terms of predictions. The entire cycle of learning from the error continues until all the trees that we supposed to train are exhausted.

Now, having understood about Gradient Boosting model, let us try to implement the same using Python as the language.


Implementing Gradient Boosting model in a dataset

We will be making use of Bike Rental Dataset for prediction. You can find the dataset here.

  1. Initially, we load the dataset into the Python environment.
  2. We can also prepare and pre-process the data using various techniques such as outlier analysis, missing value analysis, etc.
  3. Further, we split the dataset into training and test dataset using train_test_split() function.
  4. Then, we apply the Gradient boost model to the training data. Here, we make use of the GradientBoostingRegressor() function for prediction.
  5. Post which, we use MAPE as the error metric to judge the accuracy of the model in terms of error rate.

Example–

import pandas
BIKE = pandas.read_csv("day.csv")

from sklearn.model_selection import train_test_split 
X = bike.drop(['cnt'],axis=1) 
Y = bike['cnt']

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.20, random_state=0)
 
import numpy as np
def MAPE(Y_actual,Y_Predicted):
    mape = np.mean(np.abs((Y_actual - Y_Predicted)/Y_actual))*100
    return mape
 
from sklearn.ensemble import GradientBoostingRegressor
GR = GradientBoostingRegressor(n_estimators = 200, max_depth = 1, random_state = 1) 
gmodel = GR.fit(X_train, Y_train) 
g_predict = gmodel.predict(X_test)
GB_MAPE = MAPE(Y_test,g_predict)
Accuracy = 100 - GB_MAPE
print("MAPE: ",GB_MAPE)
print('Accuracy of Linear Regression: {:0.2f}%.'.format(Accuracy))

Output–

As clearly seen in the output, we have observed a Mean Absolute Percentage Error of 16.89 out of 100. Also, the accuracy obtained is 83%.

Further, to improve the accuracy of the model, we can introduce and train the hyper-parameters of the model with different scenarios into consideration.

MAPE:  16.898145257306943
Accuracy of Linear Regression: 83.10%.

Conclusion

By this, we have come to the end of this topic. Feel free to comment below in case you come across any question.

For more such posts related to Python programming, Stay tuned with us.

Till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content