Handle Missing Values Using Pandas in Python

Filed Under: Pandas
Handling Missing Values Using Pandas In Python

Hello folks, if you are working with data, you know how much time you spend on cleaning data and how important it is for further analysis. Being said that, dealing with missing data or the NaNs is very important. In python, you can use Pandas for more effective data cleaning and manipulation. Like in R, we use dplyr for data wrangling purposes and in python, it’s Pandas. Today, we will talk about handling missing values using pandas in python.


Quick Points about Pandas

  • Pandas is a python data analysis library.
  • In basic operation, you can read files and analyze data.
  • When it comes to the intermediate operations, you can clean data, format data and handle duplicates.
  • In the advanced operations, you can go for plotting and correlations.


Handling missing values using Pandas

Pandas offer multiple functions to handle missing values in python. Each function works on a different method to identify and handle Null values. Let’s explore all of those functions.

isnull()

This function will work on a boolean method to identify null values in the data.

notnull()

This function works exactly the opposite of isnull() function in Pandas.

dropna()

This function is helpful in dropping the null values from the data.

fillna()

This function in pandas helps in filling the missing values using various statistical methods.


Identifying the Null Values in the Data

Well, we discussed various functions which help in handling missing values using Pandas in python. Now, let’s understand them in depth using some examples.

To identify the null values present in the data, we can make use of isnull() and notnull() functions. As we know, both these function works on Boolean methods, their output will be in Boolean (True / False).

Let’s check both of them.

First, we will see how isnull() works with an example.

#Identifies the Null values in the data

import pandas as pd
df = pd.Series([1,2,'hi',4,None,5])
df.isnull()
0    False
1    False
2    False
3    False
4     True
5    False
dtype: bool

That’s great!

isnull() function detects the null values in the data and returns the output as bool.

The notnull() function also works same as isnull() but in the opposite way. Let’s see how it works.

Identifies the Null values in the data

import pandas as pd
df = pd.Series([1,2,'hi',4,None,5])
df.notnull()
0     True
1     True
2     True
3     True
4    False
5     True
dtype: bool

Perfect!

The notnull() function returns the bool output of the null values.


Dropping Missing Values Using Pandas

We have come across how to identify the missing values using Pandas. Now, we will look into the handling part of identified missing values using Pandas.

For this purpose, we will be using dropna() function.

Drops the Null values in the data

import pandas as pd
df = pd.Series([1,2,'hi',4,None,5])
df.dropna()
0     1
1     2
2    hi
3     4
5     5
dtype: object

You can observe that the dropna() function dropped the missing/null value in the data. In the same way, you can use this function with your dataset as well.


Filling Missing Values Using Pandas

Now, let’s see how we can fill the missing values present in the data. For this purpose, we can make use of fillna() function.

Fills the Null values in the data

import pandas as pd
df = pd.Series([1,2,'hi',4,None,5])
df.dropna()
0     1
1     2
2    hi
3     4
4     0
5     5
dtype: object

Wow!

You can see that the fillna() function is filling the missing/null values with the specified number ‘0’. As simple as it is. You should make use of all these Pandas functions with your dataset for handling the missing values.

Pandas library is very quick and easy to use and offers many functions which makes your work easier and better.


Ending Note

Well, Pandas is the go-to library for data analysis in Python. We talked about many functions which help in handling the missing values using Pandas.

These are really simple functions with simple syntax, but their effect on work will be priceless. So, don’t forget to use these Pandas functions in your data cleaning tasks.

That’s all for now. Happy python!

More read: Pandas documentation

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content