Root Mean Square Error in R

Filed Under: R Programming
Root MEAN Square Error In R

Hello, folks! Today, we will be having a look at an important error metric — Root Mean Square Error in R programming, in detail.

So, let us begin!! 馃檪


First, what is Root Mean Square Error in R?

Before diving deep into the concept of Root Mean Square Error, let us first understand its existence.

Error metrics play a very important role in the evaluation of machine learning models. They help us understand the effect of the model on the dataset in terms of accuracy and error rates.

Error metrics vary according to the type of Machine Learning algorithm. For a regression type algorithm i.e. on the data that has numeric variable as a target value, we have the below mostly used error metrics:

Today, we will be focusing on Root Mean Square Error as an error metric.

Root Mean Square Error (RMSE) is a regression error metric. That is, it is used for numeric predictions of the data. It helps us understand the way a regression line fits a model of data points.

Have a look at the below formula!

RMSE Root Mean Square Error in R
RMSE

It analyzes and compares the prediction errors of the actual and predicted values of a model. Thus, it helps us understand the concentration of data points against the best fit line.

Now, let us focus on the implementation of RMSE in R!


Calculating RMSE using R standard function

R Metrics library provides us with the rmse() function to calculate the prediction error of the residuals around the best-fit regression line.

Example:

#Removed all the existing objects
rm(list = ls())
install.packages("Metrics")
library(Metrics) 

y_actual = c(10,20,30,40,50)		 
y_predict = c(9.8,19.8,30,40,52.5)	 
 
RMSE = rmse(y_actual, y_predict) 

print(RMSE)	 

Output:

 1.125167

The sole purpose of RMSE is to calculate how well the predicted data fits with the target value. RMSE values ranges between 1 to 1000. Usually, lower value of RMSE indicates a best fit model for the data values.


Implementing RMSE with the Bank Loan Dataset

Let us now implement the RMSE error metric for Bike Rental Count Prediction dataset. You can find the dataset here!

Our main task here is to predict the count of customers that would rent a bike based on the different parameters provided.

Example:

Here, we first load the dataset into the R environment using read.csv() function. Further, we split the data into training and test portions using createDataPartition() method.

Then, we set the library MLmetrics and build a function to calculate RMSE. Here, we have calculate the Mean Square Error using mse() function, after which the square root of the result suffices our purpose of RMSE calculation.

We have applied Linear Regression model on the data set. Onto which the RMSE metric is used to understand the model in terms of the error metrics and accuracy.

#Removed all the existing objects
rm(list = ls())
#Setting the working directory
setwd("D:/Ediwsor_Project - Bike_Rental_Count/")
getwd()

#Load the dataset
bike_data = read.csv("day.csv",header=TRUE)

### SAMPLING OF DATA -- Splitting of Data columns into Training and Test dataset ###
categorical_col_updated = c('season','yr','mnth','weathersit','holiday')
library(dummies)
bike = bike_data
bike = dummy.data.frame(bike,categorical_col_updated)
dim(bike)

#Separating the depenedent and independent data variables into two dataframes.
library(caret)
set.seed(101)
split_val = createDataPartition(bike$cnt, p = 0.80, list = FALSE) 
train_data = bike[split_val,]
test_data = bike[-split_val,]

library(MLmetrics)

#2. R SQUARE error metric -- Coefficient of Determination
RMSE = function(y_actual,y_predict){
  sqrt(MSE(y_predict,y_actual))
}

##MODEL 1: LINEAR REGRESSION
linear_model = lm(cnt~., train_data) #Building the Linear Regression Model on our dataset
summary(linear_model)
linear_predict=predict(linear_model,test_data[-27]) #Predictions on Testing data

LR_RMSE = RMSE(test_data[,27],linear_predict) # Using R-SQUARE error metrics to check for the error rate and accuracy level
print("RMSE: ")
print(LR_RMSE)

Output:

RMSE:
773.5291

As a result, a value of 773.5 is obtained. This is considerably a high value of RMSE.

Thus, we can reduce the RMSE by feature selection such as Correlation regression, ANOVA tests and Removal of Outliers as well.


Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to R programming, Stay tuned and till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages