Linear Regression in R

Filed Under: R Programming
Linear Regression

Hello readers! In this article, we will be focusing on an important concept of Machine Learning models — Linear Regression in R programming.

So. let us get started!


First, what is Linear Regression?

Talking about our understanding regarding Machine Learning algorithms, they are broadly classified into Regression and Classification algorithms.

Linear Regression is one such Regression Algorithm. The main task of a Linear Regression Model is to predict numeric values for continuous data valued dataset.

By this, we mean to say, that the basic task of a linear regression model is to find the best fit line and on the basis of this line the test data values are predicted.

It tries to model the relation between two or more data variables by fitting them in the below linear equation:

y = mx + c

In terms of modeling, a linear regression model attempts to extract a linear relationship between the input (independent) variables and a single response/dependent variable.


Linear Regression – A Practical Approach

Having understood about Linear Regression, let us now understand the steps in the execution of the model on a dataset.

In this example, we would be using the Bike Rental Count Prediction Problem to predict the customer who would opt for renting bikes depending on weather and other conditions. You can find the dataset here!

1. Load the dataset

Its the time to load the dataset into the R environment. To achieve the same, we have made use of read.csv() function.

#Removed all the existing objects
rm(list = ls())
#Setting the working directory
setwd("D:/Bike_Rental_Count/")
getwd()

#Load the dataset
bike_data = read.csv("day.csv",header=TRUE)

2. Splitting the dataset

Having loaded the dataset into the R environment, now is the time to segregate the data into training and testing sets. We have made use of createDataPartition() method to split the data into training and testing data values.

### SAMPLING OF DATA -- Splitting of Data columns into Training and Test dataset ###
categorical_col_updated = c('season','yr','mnth','weathersit','holiday')

library(dummies)
bike = bike_data
bike = dummy.data.frame(bike,categorical_col_updated)
dim(bike)

#Separating the depenedent and independent data variables into two dataframes.
library(caret)
set.seed(101)
split_val = createDataPartition(bike$cnt, p = 0.80, list = FALSE) 
train_data = bike[split_val,]
test_data = bike[-split_val,]

3. Error Metrics for Evaluation

Evaluation is one of the most crucial steps in the domain of data modelling. Since this dataset belongs to Regression type, we have made use of MAPE (mean absolute percentage error) to detect the error from the predictions.

#Defining error metrics to check the error rate and accuracy of the Regression ML algorithms

#1. MEAN ABSOLUTE PERCENTAGE ERROR (MAPE)
MAPE = function(y_actual,y_predict){
  mean(abs((y_actual-y_predict)/y_actual))*100
}

4. Modelling – Linear Regression

In order to apply linear regression, we have made use of lm() function from the R documentation. After applying the model to the training data, we have used the predict() function to predict the values for the testing dataset using the same applied model.

Finally, we have evaluated the model against MAPE and Accuracy levels.

linear_model = lm(cnt~., train_data) #Building the Linear Regression Model on our dataset
summary(linear_model)

linear_predict=predict(linear_model,test_data[-27]) #Predictions on Testing data

LR_MAPE = MAPE(test_data[,27],linear_predict) # Using MAPE error metrics to check for the error rate and accuracy level

Accuracy_Linear = 100 - LR_MAPE
print("MAPE: ")
print(LR_MAPE)
print('Accuracy of Linear Regression: ')
print(Accuracy_Linear)

Output:

As seen below, linear regression applied to the dataset has got 82% accuracy for our problem statement. That is, 82% of the values are rightly predicted.

"MAPE: "
17.61674
"Accuracy of Linear Regression: "
82.38326

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

Try implementing the concept of Linear Regression in R programming on different datasets and do let us know about your understanding in the comment section.

Till then, Stay tuned and Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content