Hello, readers! In this article, we would be walking through an important concept in Machine Learning – **R squared (R2) in R** programming.

So, let us begin!!

Table of Contents

## Importance of R squared error metric

Let us first understand the importance of error metrics in the domain of Data Science and Machine Learning!!

**Error metrics** enable us to evaluate the performance of a machine learning model on a particular dataset.

There are various error metric models depending upon the class of algorithm.

We have the Confusion Matrix to deal with and evaluate Classification algorithms. While R square is an important error metric to evaluate the predictions made by a regression algorithm.

`R squared (R2)`

is a regression error metric that justifies the performance of the model. It represents the value of how much the independent variables are able to describe the value for the response/target variable.

Thus, an R-squared model describes how well the target variable is explained by the combination of the independent variables as a single unit.

The R squared value ranges between 0 to 1 and is represented by the below formula:

**R ^{2}= 1- SS_{res }/ SS_{tot}**

Here,

- SS
_{res}: The sum of squares of the residual errors. - SS
_{tot}: It represents the total sum of the errors.

Always remember, Higher the R square value, better is the predicted model!

## I. R-Squared in R with Linear Regression

In this example, we have implemented the concept of R square error metric on the Linear Regression model.

- Initially, we load our dataset using the read.csv() function.
- The next step is to segregate the data into training and test datasets. This is achieved using
`createDataPartition()`

method. - Before modeling, we have specified the custom functions for our error metrics as seen in the below example.
- The last step is to apply the linear regression model using
`lm()`

function and then we have called the user-defined R square function to evaluate the performance of the model

**Example:**

#Removed all the existing objects rm(list = ls()) #Setting the working directory setwd("D:/Ediwsor_Project - Bike_Rental_Count/") getwd() #Load the dataset bike_data = read.csv("day.csv",header=TRUE) ### SAMPLING OF DATA -- Splitting of Data columns into Training and Test dataset ### categorical_col_updated = c('season','yr','mnth','weathersit','holiday') library(dummies) bike = bike_data bike = dummy.data.frame(bike,categorical_col_updated) dim(bike) #Separating the depenedent and independent data variables into two dataframes. library(caret) set.seed(101) split_val = createDataPartition(bike$cnt, p = 0.80, list = FALSE) train_data = bike[split_val,] test_data = bike[-split_val,] ### MODELLING OF DATA USING MACHINE LEARNING ALGORITHMS ### #Defining error metrics to check the error rate and accuracy of the Regression ML algorithms #1. MEAN ABSOLUTE PERCENTAGE ERROR (MAPE) MAPE = function(y_actual,y_predict){ mean(abs((y_actual-y_predict)/y_actual))*100 } #2. R SQUARED error metric -- Coefficient of Determination RSQUARE = function(y_actual,y_predict){ cor(y_actual,y_predict)^2 } ##MODEL 1: LINEAR REGRESSION linear_model = lm(cnt~., train_data) #Building the Linear Regression Model on our dataset summary(linear_model) linear_predict=predict(linear_model,test_data[-27]) #Predictions on Testing data LR_MAPE = MAPE(test_data[,27],linear_predict) # Using MAPE error metrics to check for the error rate and accuracy level LR_R = RSQUARE(test_data[,27],linear_predict) # Using R-SQUARE error metrics to check for the error rate and accuracy level Accuracy_Linear = 100 - LR_MAPE print("MAPE: ") print(LR_MAPE) print("R-Square: ") print(LR_R) print('Accuracy of Linear Regression: ') print(Accuracy_Linear)

**Output:**

As seen below, the R square value is 0.82 i.e. the model has worked well for our data.

> print("MAPE: ") [1] "MAPE: " > print(LR_MAPE) [1] 17.61674 > print("R-Square: ") [1] "R-Square: " > print(LR_R) [1] 0.8278258 > print('Accuracy of Linear Regression: ') [1] "Accuracy of Linear Regression: " > print(Accuracy_Linear) [1] 82.38326

## II. R square value using summary() function

We can even make use of the `summary() function`

in R to extract the R square value after modelling.

In the below example, we have applied the linear regression model on our data frame and then used `summary()$r.squared`

to get the r square value.

**Example:**

rm(list = ls()) A <- c(1,2,3,4,2,3,4,1) B <- c(1,2,3,4,2,3,4,1) a <- c(10,20,30,40,50,60,70,80) b <- c(100,200,300,400,500,600,700,800) data <- data.frame(A,B,a,b) print("Original data frame:\n") print(data) ml = lm(A~a, data = data) # Extracting R-squared parameter from summary summary(ml)$r.squared

**Output:**

[1] "Original data frame:\n" A B a b 1 1 1 10 100 2 2 2 20 200 3 3 3 30 300 4 4 4 40 400 5 2 2 50 500 6 3 3 60 600 7 4 4 70 700 8 1 1 80 800 [1] 0.03809524

## Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

Till then, Happy Learning!! 🙂