Data, Data and Data. It is everywhere. People like you working in various data-related roles, will spend more than half of their time processing and exploring data. As said, it is very important to explore a lot of data to understand it and to extract key insights from it.
So, here comes EDA – Exploratory Data Analysis, where people dive into data to see what they can get. As an R user, you are always blessed. Because R offers many packages which will help you in exploring your data. Today, let’s talk about a package – Dataexplorer in R. From now, exploring data is not only easy but faster also!
What is Dataexplorer in R?
As we all know about the importance of exploring data in the analysis, Dataexplorer will be your best mate from now on.
One of the main features of Dataexplorer in R is, it will generate a simple EDA report on your data. For this, the create_report() function comes into play.
But, what if you want to explore data yourself?
Worry not! Data explorer in R offers you many functions, using which you can explore data. You can plot or see
- String / categorical data
- Density plots
- Missing values
- Basic data stats
- Correlation heatmap and more.
Installing Dataexplorer in R
First things First!
Let’s install the Dataexplorer package and load it into R. You can run the below code on your PC to install and load the library.
#Install required package install.packages('DataExplorer') #Load the library library(DataExplorer)
Well, we are ready to explore data with Dataexplorer now.
Import the Data
So, we need to import the data to perform EDA. I will go with the Iris Dataset. But, I don’t want you to be limited to this. Go with any dataset which is available to you.
#Import and load data into R df <- datasets::iris df
Have a look over the data!
Plot the structure of the Data
If you want to quickly visualize the whole structure of the data, you can use the plot_str() function for the same. I loved this function as it quickly gives you the structure of the data as shown below.
You can hover over the nodes, which creates a simple animation to highlight it as shown here.
#Plots the struture of the data plot_str(df)
KYD – Know Your Data
- Yep! First you need to know your data. So, Dataexplorer offer
introduce()function which gives you enough information about your data to work on.
- Run the below code to get this result.
#Gives basic information about the data introduce(df)
rows 150 columns 5 discrete_columns 1 continuous_columns 4 all_missing_columns 0 total_missing_values 0 complete_rows 150 total_observations 750 memory_usage 7976
Plot the Data
- Well, previously you got the text result of basic stats of your data. Now, it’s time to visualize it. You can use
Plot_intro()function for visualization. Don’t wait!!!
#Plots the basic stats of the data plot_intro(df)
Don’t Forget Missing Values
- Missing values can hurt you more if they are not properly treated or recognized. So, you can use
Plot_missing()function to visualize the missing values if any in the data.
#Plots the missing values in the data plot_missing(df)
Density plot using Dataexplorer
- Density plots are more useful to understand the distribution of the numerical variables. So, as a part of exploring data, let’s plot a density graph which shows the distribution.
- You can use
Plot_density()function for this purpose.
#Plots density graph plot_density(df)
Histogram Using Dataexplorer
- Histograms are more useful for understating the outliers and about the data distribution over the intervals. Let’s plot a histogram using plot_histogram() function.
#Plots a histogram plot_histogram(df)
Correlation Matrix using Dataexplorer
- The correlation matrix shows you the linear relationship among all the variables. As you already know, 1 is the most positively correlated and -1 is the most negatively correlated variable.
- So, let’s plot a correlation matrix for the data using function
#Plots the correlation matrix plot_correlation(df)
They said and well said – “EDA is the Heart of any Analysis”. R offers many libraries which will help you in exploring data, plotting quick visualizations, and more. In addition to that, we got Dataexplorer in R. It will help you in performing quick exploring with beautiful visualizations. I hope you liked it as much as I loved it.
That’s all for now. Happy R!!!
More read: Dataexplorer R documentation