Random Forest in Python

Filed Under: Machine Learning
Random Forest Algorithm In Python

Hello, readers! In this article, we will be focusing on Random Forest in Python, in detail.

So, let us begin! 馃檪

What is a Random Forest Model?

Before diving deep into the construction of a random forest model, it is necessary for us to understand the importance of the model in Machine Learning and Data Science.

Machine Learning offers us with various algorithms to work on the numeric as well as categorical data values.

Random Forest is one such Machine Learning model. It is a classification as well as regression model to make predictions on the labelled data values.

To be precise, Random Forest is an ensemble model wherein it constructs a huge model with multiple decision tree models. The error model of a decision tree is fed to the other trees to minimize the misclassification error of the model. This strengthens the model in terms of prediction of the data values with reduced chances of error.

Random Forest helps us better the efficiency of the model as it reduces the chances of training errors with ensemble technique being implemented to it with bagging process.

Let us now focus on the steps to build a random forest model in Python.

Steps to build a Random Forest Model

  1. Pick some random data points ‘x’ from the training data.
  2. Build the decision tree on those data points.
  3. Choose the number of trees to be built and repeat the steps 1 and 2.
  4. Further, internally, the misclassification error of each decision tree is fed to the next tree and this continues until the data exhausts.

Simple Implementation of Random Forest

For the purpose of implementation, we have made use of Bike Rental Count dataset. You can find the dataset here!

In this example, as we have made use of regression dataset i.e. the response variable is of continuous type, we have made use of Random Forest Regression algorithm here as shown below!

At first, we load the dataset into the Python environment using the read_csv() function. Having loaded the dataset, we then split the dataset into the training and test values.

Then, we import the RandomForestRegressor class through sklearn library to implement Random Forest.

  • At first, we create an object for the model using RandomForestRegressor() function and fit it to the training set using fit() function.
  • Further, with predict() function, we try to predict the data values of the test class.
  • We make use of MAPE as an error metric using MAPE() function (customized).
  • At last, to verify the efficiency of the model, we make use of Accuracy score.


import pandas
BIKE = pandas.read_csv("day.csv")

###SAMPLING OF DATA -- Splitting of Data columns into Training and Test dataset###
categorical_col_updated = c('season','yr','mnth','weathersit','holiday')
bike = bike_data
bike = dummy.data.frame(bike,categorical_col_updated)

#Separating the depenedent and independent data variables into two dataframes.
split_val = createDataPartition(bike$cnt, p = 0.80, list = FALSE) 
train_data = bike[split_val,]
test_data = bike[-split_val,]

#Building the Decision Tree Model on our dataset
from sklearn.ensemble import RandomForestRegressor
Random_model = RandomForestRegressor(n_estimators=300).fit(X_train,Y_train)
Random_predict = Random_model.predict(X_test) #Predictions on Testing data

# Using MAPE error metrics to check for the error rate and accuracy level
Random_MAPE = MAPE(Y_test,Random_predict)
Accuracy_Random = 100 - Random_MAPE
print("MAPE: ",Random_MAPE)
print('Accuracy of Random Forest model: {:0.2f}%.'.format(Accuracy_Random))


In this, we have obtained a Mean Absolute Percentage Error of 15%. Thus, Random Forest Model gives us an accuracy of 84.5%.

MAPE:  15.563241604682945
Accuracy of Random Forest model: 84.44%.


By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to Python, Stay tuned with us.

Till then, Happy Learning! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

Generic selectors
Exact matches only
Search in title
Search in content