Data exploration or exploratory data analysis is an integral part of any analysis project. It not only explores the data, but it describes your data. It enables you to understand your data and the features in it.
The data exploration in the earlier phase will help you in the model-building stages. Usually, people spend most of their time on EDA. Being said that, we have discussed many libraries which help you in EDA.
Today’s it’s time for the data describe library available in python.Â
So, without wasting much time on the introduction, let’s see how we can install this library and work with it.
More read:
- QuickDA in Python: Explore Your Data In Seconds.
- Klib in Python – Speed Up Your Data Visualization.
1. Installing the data_describe library in Python
To install the data_describe library in python, you can execute the below code. You have to write a pip command for this.
#installation
!pip install data_describe

You can refer to the last line in the image for the successful installation message. After this, you have to import the library into python to work with it.
#import
import data_describe as d_d
Perfect! You have successfully installed and imported the required library. Now, let’s see what it offers to us.
2. Load the Data
We need to explore the data. And so, we’ll work on the coffee sales data. I chose this just because, it is quite big to explore and it’s a real-world dataset.Â
You can download the dataset here.
#load the data
import pandas as pd
data = pd.read_csv('coffeesales.csv')
data.head(5)

Whoo! Our data is ready to explore.
3. Summary (Statistical) of the Data
It is much important to understand the statistical summary of the data. It will uncover the min, max, median values along with unique and null values as well.
#summary
d_d.data_summary(data)

The above line of code returns a small block of info followed by a brief summary of the data. Note that, the summary function will only work on numerical attributes and hence you can see the blank values for the categorical attributes.
4. Heatmap
Yes, you can plot a heatmap for the whole data using the heatmap function offered by the data describe the library. Let’s see how it works.
#heatmap
d_d.data_heatmap(data)

Here is our beautiful heatmap. The best thing about this library is, it offers many functions which will help us in exploring the data that to with one line of code :P.
5. Correlation Matrix
The correlation matrix is used to display the correlation between the attributes in the data. It consists of the rows and columns which represent the attributes present in the data.
#correlation
d_d.correlation_matrix(data)

As usual, all this happens with one line of code 🙂
6. Scatter Plots
Scatter plots using the cartesian coordinates to display the data values on the plot. These are used to explore the relationship between two numerical variables. Let’s see how we can plot a scatter graph using the data describe library-based scatter_plot function.
#scatter plots
d_d.scatter_plots(data, plot_mode='matrix')

You can also call this plot a scatter matrix. Here I have passes the plot_mode argument as Matrix. You can try using different parameters or arguments to the scatter function.Â
7. Clustering
The data points which show similar features can be clustered as a similar group. We can get to see multiple clusters in the data.
Cluster plots will help us to visualize these clusters in the data.
#cluster plots
d_d.cluster(data)

That’s cool! We can see 3 different clusters in this data according to their behavior. You can also see the clusters in the scatter plots as well. But, cluster plots will better serve the purpose.
8. Feature Importance Plot
We already know that all the features in our data will not contribute to our purpose. So, it is very important to find the most important or the relevant features for our analysis or modeling purpose.
Here comes the feature importance plots, which displays the most important features in our dataset.
#feature importance
d_d.importance(data, 'sales')

Basically what it does is, it will estimate the importance of the features based on the ‘sales’ attribute in the data. For this, the data_describe library offers the important function as shown above.Â
Wrapping Up – Data Describe
Data describe is one of the fastest and easiest libraries that one can use to explore the data. I personally enjoyed using it to explore the data. It offers many useful functions and saves a lot of time for sure. I hope you find this library useful and don’t forget to give it a try in your upcoming analysis work.
That’s all for now. Happy Python!!!
More read: Official documentation of the library