Decision Tree in R – A Practical Guide!

Filed Under: R Programming
Decision Trees In R

Hello, readers! In this article, we will be focusing on an important algorithm in the domain of Machine Learning — Decision Tree in R, in a stepwise approach.

So, let us begin!! 馃檪


First, what is a Decision Tree in R?

Modeling is one of the crucial steps in the domain of Data Science. With the help of modeling, one can achieve substantial predictions for a real-life problem to be solved.

Decision Tree is a Regression as well as Classification Algorithm of Machine Learning. It is a non-parametric algorithm that delivers the outcome based on certain rules or decisions at every step of processing.

As the name suggests, it creates a tree of decisions or rules based on the inputs provided in accordance with the if-else rule technique.

As it a supervised Machine Learning algorithm, Decision Trees learn from the historic data provided. It does create a tree or flow chart of rules and continues to use the if else approach unless the variables are exhausted.

Traits of a Decision Tree:

  1. Decision Tree Classifier: It identifies whether a particular input belongs to a category or label of data values.
  2. Decision Tree Regressor: The main task is to predict the estimated values for numeric data inputs.

Machine Learning Decision Tree in R – Practical Approach

Having understood the concept of Decision Trees, let us now try to implement the same.

In this entire course of the topic, we would be making use of the Bike Rental Count Prediction Problem.

Our task is to use Decision Trees to predict the count of people that would opt to rent a bike depending on environmental conditions. You can find the dataset here!

If you analyze the dataset, one can clearly make out that it is a regression problem because the response variable ‘cnt’ is continuous in nature.

Let us now start to conduct the steps of modelling.

Please find the entire code below!

#Removed all the existing objects
rm(list = ls())
#Setting the working directory
setwd("D:/Ediwsor_Project - Bike_Rental_Count/")
getwd()

#Load the dataset
bike_data = read.csv("day.csv",header=TRUE)
categorical_col_updated = c('season','yr','mnth','weathersit','holiday')
library(dummies)
bike = bike_data
bike = dummy.data.frame(bike,categorical_col_updated)
dim(bike)

#Separating the depenedent and independent data variables into two dataframes.
library(caret)
set.seed(101)
split_val = createDataPartition(bike$cnt, p = 0.80, list = FALSE) 
train_data = bike[split_val,]
test_data = bike[-split_val,]

#1. MEAN ABSOLUTE PERCENTAGE ERROR (MAPE)
MAPE = function(y_actual,y_predict){
  mean(abs((y_actual-y_predict)/y_actual))*100
}

#2. R SQUARE error metric -- Coefficient of Determination
RSQUARE = function(y_actual,y_predict){
  cor(y_actual,y_predict)^2
}

#MODEL 2: DECISION TREES
library(rpart)
DT_model =rpart(cnt~., train_data, method = "anova" , minsplit=5)
DT_predict = predict(DT_model,test_data[-27])
DT_MAPE = MAPE(test_data[,27],DT_predict)
DT_R = RSQUARE(test_data[,27],DT_predict)
Accuracy_DT = 100 - DT_MAPE

print("MAPE: ")
print(DT_MAPE)
print("R-Square: ")
print(DT_R)
print('Accuracy of Decision Tree: ')
print(Accuracy_DT)

Explanation:

  1. Initially, we need to load the dataset into the R environment. We have done this using the read.csv() function.
  2. Prior to modeling, it is very important for us to sample the dataset into training and test data. We have achieved the same using createDataPartition() method from ‘caret‘ library of R documentation.
  3. Now, is the time for us to apply Decision Tree modeling to our dataset. Since, it a regression problem statement, we would be making use of Decision Tree Regressor on the modeling front. The rpart() function from ‘rpart’ library enables us to apply the Decision Tree algorithm on the dataset.
  4. Using the R predict() function, we make predictions on the test data using the model.
  5. At last, we evaluate the model through the regression error metrics MAPE, R-square, Accuracy, etc.

Output:

[1] "MAPE: "
[1] 26.3328
[1] "R-Square: "
[1] 0.7260578
[1] "Accuracy of Decision Tree: "
[1] 73.6672

As a result, our Decision Tree model has obtained an accuracy of 73.66% on the test data.


Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

Try implementing the concept of Decision Trees on other datasets and do let us know your understanding about the same.

For more such posts related to R programming, stay tuned and till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content