Hello Readers! Welcome you all to explore another interesting topic in Machine Learning with R programming – Correlation Regression Analysis, in detail!
So, let us begin!! 🙂
First, what is Correlation Regression Analysis?
Before diving deep into the implementation of Correlation Regression Analysis using R programming, let us first have a glance at Correlation in data variables.
In the domain of Machine Learning and Data Science, the main step prior to modeling is Data pre-processing. In this step, we analyze the effect of every independent variable on the other variables as well as the target value and also clean the data in terms of error and noise.
This is when Correlation Regression Analysis comes into picture.
Correlation is the measure of the relationship of dependency amongst the variables of a dataset. With Correlation, we can analyze the relationship that the independent variables of the dataset possess and represent. Correlation works for continuous variables i.e. numeric data variables only.
That is why, it is termed as Correlation Regression Analysis.
Along with it, this regression analysis can help us extract relevant variables (features) from a huge dataset.
Correlation values ranges between – 1 to + 1. With +1 shows highest positive correlation, while -1 represents highest negative correlation.
If two or more variables possess a high value of positive correlation, it can be assumed that the variables represent the same information about the target/response variable. Thus, it makes us liable to delete or drop any one of the variables.
Correlation Regression Analysis makes use of correlation matrix to represent the correlation values in the form of rows and column relationship.
Let us now implement Correlation Regression using R as the primary programming language.
Correlation Regression Analysis using R
R as a programming language contains various functions and packages to perform tasks.
In order to perform Correlation Regression Analysis, the best and easiest way is to make use of in-built
The cor() function takes the data frame (dataset) as the input and returns the correlation matrix as the result.
Let us consider the below example. In this example, we have made use of the Bank Loan dataset. You can find the dataset here!
Initially, we have loaded the dataset into the R environment using the read.csv() function. Further, we have extracted the numeric (continuous) data columns of the dataset into a separate variable as shown below.
Now, we make use of cor() function to generate a correlation matrix out of the extract numeric data.
bank = read.csv("bank loan.csv",header=TRUE,sep=",",stringsAsFactors=FALSE) my_data <- bank[, c(1,3,4,5,6,7,8)] head(my_data) cor_mat=cor(my_data) print("Correlation Matrix:") print(cor_mat)
Apart from the matrix representation, we can further visualize the correlation analysis using the corrplot library in R.
corrplot() function accepts the correlation matrix object and the method of visualization as the input and then returns the correlation plot as shown below:
corrplot(cor_mat, method = "circle")
Instead of ‘circle’ as the method of visualization, we can use
method='number' to represent the plot using the corresponding correlation values of the variables as shown below:
corrplot(cor_mat, method = "number")
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question. Do let us know your understanding of this topic in the comment section. Till then, Happy Learning!! 🙂