Missingno – Visualize Missing Values In Python

Filed Under: Python Modules
Visualize Missing Values In Python

Probably most of the analyst’s day won’t get completed without missing values. Yes, They exist. Generally, missing or null values are present in the data due to human errors or incorrect measurements. You may use R, Java, Python, and even Excel, you will get many ways to deal with missing values.

You can find their existence, count, and even highlight them. But, have you ever thought of visualizing missing values?. If Yes, then you are awesome!. In this story, let’s focus on missingno – a python library to visualize missing values. 


Dealing With Missing Values in Python

As I already told you, whether you accept or not, missing values became a part of data and life as well. You have to live with it. When it comes to python, there are many ways you can deal with missing or null values.

  • Drop the entire row which includes missing values.
  • Drop the entire column which has missing values.
  • Fill the missing values with alternative data.
  • Impute the missing data with mean or median.

But, always make sure, why there are missing values and what they are trying to convey. Because, whenever we choose to drop values, we are losing useful information.


Installing Missingno in Python

Fine, we finally agreed that we will get missing values and there are many meaningful ways are there to deal with it. So, now we will move to install the missingno package in python with pip, which helps in visualizing missing values.

#install missingno

pip install missingno

#Import the library

import missingno as msnum 

We have to import some of the dependencies as well to support missingno library.

#import dependencies

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline 

That’s great! Our ammunition is ready and let’s roll!!!


Missingno in Python

  • The missingno library in python offers many functions using which you can plot different graphs to visualize the missing values data.
  • It offers bar, matrix and heatmap plots to visualize the missing values in the data.
  • Your data may look messy or have many null values, worry not, missingno will make things look easy.
  • It is simple to use library, having simple syntax.
  • Offers clear and dynamic missing value visuals.

Getting the Data

Well, we are going to use Housing data for the illustration. We will read the data, check for missing values, if we find any, then we will visualize them.

#read the data

import pandas as pd

data = pd.read_csv('Housing.csv')
Housing

That’s good! I think in the first few rows, I don’t see any missing values 😛 Let’s dig deep!

#Shape of the data

data.shape

(545, 13)

  • The data has 545 rows and 13 columns / variables.
# datatypes

data.dtypes
price                 int64
area                  int64
bedrooms              int64
bathrooms           float64
stories             float64
mainroad             object
guestroom            object
basement             object
hotwaterheating      object
airconditioning      object
parking             float64
prefarea             object
furnishingstatus     object
dtype: object
  • We got both categorical and quantitative attributes in our data.
#Missing values check - Boolean 

data.isnull().any()
price               False
area                False
bedrooms            False
bathrooms            True
stories              True
mainroad            False
guestroom            True
basement            False
hotwaterheating      True
airconditioning     False
parking              True
prefarea            False
furnishingstatus    False
dtype: bool
  • In the logical test, we got evidence for the presence of missing values.
#count of missing values

data.isnull().sum()
price                0
area                 0
bedrooms             0
bathrooms           13
stories              6
mainroad             0
guestroom            8
basement             0
hotwaterheating     27
airconditioning      0
parking              7
prefarea             0
furnishingstatus     0
dtype: int64
  • We got the count of missing values, in total we have 47 missing values in the data.

1. Missngno – Bar plot

Now, using the barplot function by missingno library, we are going to plot the bar graph of the missing values in the data.

#bar plot

import missingno as msnum 
msnum.bar(data)
Missing Values Bar

That’s perfect! I feel so good to see a library visualizing the missing values so meaningfully and beautifully. You can see the missing values in each variable with the help of each bar.


2. Missingno – Matrix plot

Yes, this library also provides the matrix plot to visualize the null values. Personally, I love this plot very much because it shows even the place of missing values in the data.

#Matrix plot 

import missingno as msnum 
msnum.matrix(data)
Matrix Plot Missing Values

Such a beautiful plot I have ever seen. I hope now you are slowly falling into love with this. You know, sometimes you cannot resist something!


3. Missingno – Heatmaps

Finally, using this library we can plot the heatmaps of the missing values in the data. Let’s see how it works!

#Heatmaps

import missingno as msnum 
msnum.heatmap(data)
Heatmaps Missingno

That’s cool!

Here, the heatmap shows the correspondence between two variables about the missing values.


Wrapping Up

The missingno in python is one of the simple and easy uses of the library. You can make use of 3 different plot types to visualize the missing values in the data.

I hope you enjoyed the story and that’s all for now! Happy Python!!

More read: Missing values

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content