Probably most of the analyst’s day won’t get completed without missing values. Yes, They exist. Generally, missing or null values are present in the data due to human errors or incorrect measurements. You may use R, Java, Python, and even Excel, you will get many ways to deal with missing values.
You can find their existence, count, and even highlight them. But, have you ever thought of visualizing missing values?. If Yes, then you are awesome!. In this story, let’s focus on missingno – a python library to visualize missing values.
Dealing With Missing Values in Python
As I already told you, whether you accept or not, missing values became a part of data and life as well. You have to live with it. When it comes to python, there are many ways you can deal with missing or null values.
- Drop the entire row which includes missing values.
- Drop the entire column which has missing values.
- Fill the missing values with alternative data.
- Impute the missing data with mean or median.
But, always make sure, why there are missing values and what they are trying to convey. Because, whenever we choose to drop values, we are losing useful information.
Installing Missingno in Python
Fine, we finally agreed that we will get missing values and there are many meaningful ways are there to deal with it. So, now we will move to install the missingno package in python with pip, which helps in visualizing missing values.
#install missingno pip install missingno #Import the library import missingno as msnum
We have to import some of the dependencies as well to support missingno library.
#import dependencies import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt %matplotlib inline
That’s great! Our ammunition is ready and let’s roll!!!
Missingno in Python
- The missingno library in python offers many functions using which you can plot different graphs to visualize the missing values data.
- It offers bar, matrix and heatmap plots to visualize the missing values in the data.
- Your data may look messy or have many null values, worry not, missingno will make things look easy.
- It is simple to use library, having simple syntax.
- Offers clear and dynamic missing value visuals.
Getting the Data
Well, we are going to use Housing data for the illustration. We will read the data, check for missing values, if we find any, then we will visualize them.
#read the data import pandas as pd data = pd.read_csv('Housing.csv')
That’s good! I think in the first few rows, I don’t see any missing values 😛 Let’s dig deep!
#Shape of the data data.shape
- The data has 545 rows and 13 columns / variables.
# datatypes data.dtypes
price int64 area int64 bedrooms int64 bathrooms float64 stories float64 mainroad object guestroom object basement object hotwaterheating object airconditioning object parking float64 prefarea object furnishingstatus object dtype: object
- We got both categorical and quantitative attributes in our data.
#Missing values check - Boolean data.isnull().any()
price False area False bedrooms False bathrooms True stories True mainroad False guestroom True basement False hotwaterheating True airconditioning False parking True prefarea False furnishingstatus False dtype: bool
- In the logical test, we got evidence for the presence of missing values.
#count of missing values data.isnull().sum()
price 0 area 0 bedrooms 0 bathrooms 13 stories 6 mainroad 0 guestroom 8 basement 0 hotwaterheating 27 airconditioning 0 parking 7 prefarea 0 furnishingstatus 0 dtype: int64
- We got the count of missing values, in total we have 47 missing values in the data.
1. Missngno – Bar plot
Now, using the barplot function by missingno library, we are going to plot the bar graph of the missing values in the data.
#bar plot import missingno as msnum msnum.bar(data)
That’s perfect! I feel so good to see a library visualizing the missing values so meaningfully and beautifully. You can see the missing values in each variable with the help of each bar.
2. Missingno – Matrix plot
Yes, this library also provides the matrix plot to visualize the null values. Personally, I love this plot very much because it shows even the place of missing values in the data.
#Matrix plot import missingno as msnum msnum.matrix(data)
Such a beautiful plot I have ever seen. I hope now you are slowly falling into love with this. You know, sometimes you cannot resist something!
3. Missingno – Heatmaps
Finally, using this library we can plot the heatmaps of the missing values in the data. Let’s see how it works!
#Heatmaps import missingno as msnum msnum.heatmap(data)
Here, the heatmap shows the correspondence between two variables about the missing values.
The missingno in python is one of the simple and easy uses of the library. You can make use of 3 different plot types to visualize the missing values in the data.
I hope you enjoyed the story and that’s all for now! Happy Python!!
More read: Missing values