Random Forest in R – Practical Implementation

Filed Under: R Programming
RANDOM FOREST IN R

Hello readers!! In this article, we would be focusing on Random Forest in R programming, in detail.

So, let us get started!


First, working of Random Forest?

Before getting started with Random Forests, let us first understand the importance of Machine Learning algorithms.

Using the various machine learning algorithms, one can make predictions on real-life scenarios and issues. Supervised Machine Learning algorithms are broadly classified into Regression and Classification algorithms.

Regression algorithms work on numeric or continuous data while the Classification Algorithms work on categorical data i.e. data that deal with groups.

Random Forest is one such Machine Learning Algorithm. It works both as a Classification as well as Regression Algorithm.

It works on the concept of ensemble learning. That is, it accepts multiple instances of the same or different algorithms into the building process to increase the strength and credibility of the model.

In Random Forest model, it builds multiple decision tree like structure and then feeds the misclassification error value of every tree to the upcoming instance of the tree. This also strengthens the model in terms of the prediction rates.

  1. At first, we select random N data points from the dataset (preferably training data set).
  2. Decision Trees are built over those N data points.
  3. The misclassification error of a tree is fed to the other.
  4. The process continuous till the value of N reaches the last value.
  5. Finally, we make predictions on the test data values.

Let us now practically understand the implementation of Random Forest in R programming.


Steps to implement Random Forest in R

  1. Initially, we load the dataset into the R environment using read.csv() function. In this example, we have made use of Bike Rental Count Prediction problem. You can find the dataset here!
#Removed all the existing objects
rm(list = ls())
#Setting the working directory
setwd("D:/Ediwsor_Project - Bike_Rental_Count/")
getwd()

#Load the dataset
bike_data = read.csv("day.csv",header=TRUE)

2. We now split the dataset into the training and testing values. For the same, we create dummies of categorical variables using dummy.data.frame() function. Now, we use createDataPartition() method to split the data.

###SAMPLING OF DATA -- Splitting of Data columns into Training and Test dataset###
categorical_col_updated = c('season','yr','mnth','weathersit','holiday')
library(dummies)
bike = bike_data
bike = dummy.data.frame(bike,categorical_col_updated)
dim(bike)
#Separating the depenedent and independent data variables into two dataframes.
library(caret)
set.seed(101)
split_val = createDataPartition(bike$cnt, p = 0.80, list = FALSE) 
train_data = bike[split_val,]
test_data = bike[-split_val,]

3. Now is the time to define the error metric for the model. Here, we have built customized functions for the error metric — MAPE and R square values.

###MODELLING OF DATA USING MACHINE LEARNING ALGORITHMS####
#Defining error metrics to check the error rate and accuracy of the Regression ML algorithms

#1. MEAN ABSOLUTE PERCENTAGE ERROR (MAPE)
MAPE = function(y_actual,y_predict){
  mean(abs((y_actual-y_predict)/y_actual))*100
}

#2. R SQUARE error metric -- Coefficient of Determination
RSQUARE = function(y_actual,y_predict){
  cor(y_actual,y_predict)^2
}

4. Finally, we used randomForest() function to apply the model to the training data. Here, we have set the count of the number of trees to be built as 300. After which, we make use of predict() function to test the data against test values.

##MODEL 3: RANDOM FOREST
library(randomForest)
set.seed(123)
RF_model = randomForest(cnt~., train_data, ntree = 300, importance = TRUE)
RF_predict=predict(RF_model,test_data[-27])
RF_MAPE = MAPE(test_data[,27],RF_predict)
RF_R = RSQUARE(test_data[,27],RF_predict)
Accuracy_RF = 100 - RF_MAPE
print("MAPE: ")
print(RF_MAPE)
print("R-Square: ")
print(RF_R)
print('Accuracy of Random Forest: ')
print(Accuracy_RF)

Output:

As seen below, the model returns an accuracy of 83.43%

[1] "MAPE: "
[1] 16.56531
[1] "R-Square: "
[1] 0.880482
[1] "Accuracy of Random Forest: "
[1] 83.43469

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to R programming, Stay tuned and till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content