Dataexplorer in R – EDA is Super Easy Now

Filed Under: R Programming
Data Explorer In R

Data, Data and Data. It is everywhere. People like you working in various data-related roles, will spend more than half of their time processing and exploring data. As said, it is very important to explore a lot of data to understand it and to extract key insights from it.

So, here comes EDA – Exploratory Data Analysis, where people dive into data to see what they can get. As an R user, you are always blessed. Because R offers many packages which will help you in exploring your data. Today, let’s talk about a package – Dataexplorer in R. From now, exploring data is not only easy but faster also!


What is Dataexplorer in R?

As we all know about the importance of exploring data in the analysis, Dataexplorer will be your best mate from now on.

One of the main features of Dataexplorer in R is, it will generate a simple EDA report on your data. For this, the create_report() function comes into play.

But, what if you want to explore data yourself?

Worry not! Data explorer in R offers you many functions, using which you can explore data. You can plot or see

  • String / categorical data
  • Histograms
  • Density plots
  • Missing values
  • Basic data stats
  • Correlation heatmap and more.

Installing Dataexplorer in R

First things First!

Let’s install the Dataexplorer package and load it into R. You can run the below code on your PC to install and load the library.

#Install required package
install.packages('DataExplorer')

#Load the library
library(DataExplorer)

Well, we are ready to explore data with Dataexplorer now.


Import the Data

So, we need to import the data to perform EDA. I will go with the Iris Dataset. But, I don’t want you to be limited to this. Go with any dataset which is available to you.

#Import and load data into R
df <- datasets::iris
df
Iris Data
Iris Data

Have a look over the data!


Plot the structure of the Data

If you want to quickly visualize the whole structure of the data, you can use the plot_str() function for the same. I loved this function as it quickly gives you the structure of the data as shown below.

You can hover over the nodes, which creates a simple animation to highlight it as shown here.

#Plots the struture of the data
plot_str(df)
Plot Str
Plot_str()

KYD – Know Your Data

  • Yep! First you need to know your data. So, Dataexplorer offer introduce() function which gives you enough information about your data to work on.
  • Run the below code to get this result.
#Gives basic information about the data
introduce(df)
rows                         150
columns                      5
discrete_columns             1
continuous_columns           4
all_missing_columns          0
total_missing_values         0
complete_rows                150
total_observations           750
memory_usage                 7976
               

Plot the Data

  • Well, previously you got the text result of basic stats of your data. Now, it’s time to visualize it. You can use Plot_intro() function for visualization. Don’t wait!!!
#Plots the basic stats of the data
plot_intro(df)
Plot Intro
Plot_intro()

Don’t Forget Missing Values

  • Missing values can hurt you more if they are not properly treated or recognized. So, you can use Plot_missing() function to visualize the missing values if any in the data.
#Plots the missing values in the data
plot_missing(df)
Plot Missing
Plot_missing()

Density plot using Dataexplorer

  • Density plots are more useful to understand the distribution of the numerical variables. So, as a part of exploring data, let’s plot a density graph which shows the distribution.
  • You can use Plot_density() function for this purpose.
#Plots density graph
plot_density(df)
Plot Density - Dataexplorer in R
Plot_density()

Histogram Using Dataexplorer

  • Histograms are more useful for understating the outliers and about the data distribution over the intervals. Let’s plot a histogram using plot_histogram() function.
#Plots a histogram 
plot_histogram(df)
Plot Histogram - Dataexplorer in R
Plot_histogram()

Correlation Matrix using Dataexplorer

  • The correlation matrix shows you the linear relationship among all the variables. As you already know, 1 is the most positively correlated and -1 is the most negatively correlated variable.
  • So, let’s plot a correlation matrix for the data using function plot_correlation().
#Plots the correlation matrix
plot_correlation(df)
Plot Correlation  - Dataexplorer in R
Plot_correlation()

Ending Note

They said and well said – “EDA is the Heart of any Analysis”. R offers many libraries which will help you in exploring data, plotting quick visualizations, and more. In addition to that, we got Dataexplorer in R. It will help you in performing quick exploring with beautiful visualizations. I hope you liked it as much as I loved it.

That’s all for now. Happy R!!!

More read: Dataexplorer R documentation

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content