Correlation Regression Analysis in Python

Filed Under: Python Advanced
Correlation Between Variables

Hey, folks! In this article, we will be focusing on Correlation Regression analysis to find the correlation between variables in Python.

So, let us begin!


What is Correlation Regression Analysis?

Correlation Regression Analysis is an important step in the process of data pre-processing for modeling of datasets. For any dataset, it is very important to depict the relationship between the variables and understand the effect of variables on the overall prediction of the data as well as the target/response variable.

This is when, Correlation Regression Analysis comes into picture.

Correlation Analysis helps us analyze the below aspects of data–

  • Relationship between the independent variables i.e. information depicted by them and their correlation.
  • Effect of the independent variables on the dependent variable.

It is crucial for any developer to understand the correlation between the independent variables.

Correlation ranges from 0 to 1. A high correlation between the two variables depicts that both the variables represent the same information.

Thus, it gives rise to multicollinearity and we can drop either of those variables.

Having understood the concept of Correlation, let us now try to implement it practically in the upcoming section.


Finding Correlation between variables

Let us first start with importing the dataset. You can find the dataset here. We have loaded the dataset into the environment using the read_csv() function.

Further, we have segregated all the numeric variables of the dataset and stored them. Because, correlation works only on numeric data. We have applied the corr() function to depict the correlation between the variables through the correlation matrix.

import pandas
data = pandas.read_csv("Bank_loan.csv")
#Using Correlation analysis to depict the relationship between the numeric/continuous data variables
numeric_col = ['age',employ','address','income','debtinc','creddebt','othdebt']
corr = data.loc[:,numeric_col].corr()
print(corr)

Output:

Correlation Regression Matrix
Correlation Regression Matrix

We can use seaborn.heatmap() function to visualize the correlation data in the range of 0 to 1 as shown below–

sn.heatmap(corr, annot=True)

Output:

Correlation Regression Heatmap
Correlation Regression Heatmap

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to Python, Stay tuned @ Python with JournalDev and till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages