XGBoosted Tree in R – All you need to know!

Filed Under: R Programming
XGBOOSTED TREE In R

Hello, readers! In this article, we will be focusing on XGBoosted Tree in R programming. So, let us begin!!


Understanding XGBoosted Tree

Before diving deep into the concept of XGBoosted Trees, let us first understand its origin and relative concepts.

In the domain of Data Science and Machine Learning, we make use of various algorithms to predict the outcome of the real life scenarios and problems.

There are various Classification and Regression algorithms in use that makes our work easy in terms of predictions.

That’s not all, while choosing the machine learning algorithm, we need to judge it on various parameters to understand its advantages to the model.

When it comes to dealing huge datasets and having higher accuracy in terms of predictions, we need to inspect algorithms accordingly.

XGBoosted Tree is one of them.

XGBoosted Tree algorithm, an acronym for Extreme Gradient Boosting, is a supervised machine learning algorithm that is well known for its scalability and high speed of execution (deals with huge datasets). It works on the framework of the Gradient Boosted model.

So, boosting is a technique where we train the models one after another, and the errors of the previous model are tested and removed in further iterations.

On an advanced note – Gradient Boosting techniques provide us with an upper-hand.

Here, in every iteration, it makes the new predictor fit the residuals of the predictor from the previous iteration.

XGBoosted Tree makes use of this Gradient Boosted framework to deliver highly efficient solutions with less computation time.

Thus, it learns continuously and makes use of a number of trees to learn from the residual simultaneously and grow!


Advantages of XGBoosted Tree

  • Performs optimization of cache.
  • It executes processes in parallel.
  • Capability of distributed computing
  • Less execution time.
  • It has the power to work on relatively large data sets.

Practical Implementation of XGBoosted Tree in R

In this example, we have made use of Bike Rental Count Prediction, wherein our task is to predict the number of customers who would opt for a rented bike based on certain environmental condition.

You can find the dataset here!

Initially, we load the dataset into the R environment using read.csv() function.

#Removed all the existing objects
rm(list = ls())
#Setting the working directory
setwd("D:/Ediwsor_Project - Bike_Rental_Count/")
getwd()
 
#Load the dataset
bike_data = read.csv("day.csv",header=TRUE)

Further, we split the dataset into train and test dataset using createDataPartition() method.

#### SAMPLING OF DATA -- Splitting of Data columns into Training and Test dataset ####
categorical_col_updated = c('season','yr','mnth','weathersit','holiday')
library(dummies)
bike = bike_data
bike = dummy.data.frame(bike,categorical_col_updated)
dim(bike)
 
#Separating the depenedent and independent data variables into two dataframes.
library(caret)
set.seed(101)
split_val = createDataPartition(bike$cnt, p = 0.80, list = FALSE) 
train_data = bike[split_val,]
test_data = bike[-split_val,]
 
library(MLmetrics)
MAPE = function(y_actual,y_predict){
  mean(abs((y_actual-y_predict)/y_actual))*100
}

Now, is the time to apply the model. The library ‘xgboost‘ provides us with xgboost() function. Prior to modelling, we need to convert the data into a matrix form because xgboost() function works on matrix format of data.

After applying the model, we make use of predict() function to make predictions on the test data. We test the efficiency of the model using MAPE error metric.

##MODEL 5: XBoosted Tree
library(xgboost)
train_matrix = as.matrix(sapply(train_data[-27],as.numeric))
test_matrix = as.matrix(sapply(test_data[-27],as.numeric))

xgboost_model = xgboost(data = train_matrix,label = train_data$cnt, nrounds = 15,verbose = FALSE)
xgboost_predict = predict(xgboost_model,test_matrix)
xgboost_MAPE = MAPE(test_data[,27],xgboost_predict)
Accuracy_xgboost = 100 - xgboost_MAPE
print("MAPE: ")
print(xgboost_MAPE)
print('Accuracy of XGBOOST: ')
print(Accuracy_xgboost)

Output:

As a result, we have obtained an accuracy of 82.97%

> print("MAPE: ")
[1] "MAPE: "
> print(xgboost_MAPE)
[1] 17.02396
> print('Accuracy of XGBOOST: ')
[1] "Accuracy of XGBOOST: "
> print(Accuracy_xgboost)
[1] 82.97604

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to R programming, stay tuned with us!

Till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages