Tutorial

How To Use Python pandas dropna() to Drop NA Values from DataFrame

Published on August 3, 2022

By Pankaj

How To Use Python pandas dropna() to Drop NA Values from DataFrame

Introduction

In this tutorial, you’ll learn how to use panda’s DataFrame dropna() function.

NA values are “Not Available”. This can apply to Null, None, pandas.NaT, or numpy.nan. Using dropna() will drop the rows and columns with these values. This can be beneficial to provide you with only valid data.

By default, this function returns a new DataFrame and the source DataFrame remains unchanged.

This tutorial was verified with Python 3.10.9, pandas 1.5.2, and NumPy 1.24.1.

Syntax

dropna() takes the following parameters:

dropna(self, axis=0, how="any", thresh=None, subset=None, inplace=False)

axis: {0 (or 'index'), 1 (or 'columns')}, default 0
If 0, drop rows with missing values.
If 1, drop columns with missing values.
how: {'any', 'all'}, default 'any'
If 'any', drop the row or column if any of the values is NA.
If 'all', drop the row or column if all of the values are NA.
thresh: (optional) an int value to specify the threshold for the drop operation.
subset: (optional) column label or sequence of labels to specify rows or columns.
inplace: (optional) a bool value.
If True, the source DataFrame is changed and None is returned.

Constructing Sample DataFrames

Construct a sample DataFrame that contains valid and invalid values:

dropnaExample.py

import pandas as pd
import numpy as np

d1 = {
'Name': ['Shark', 'Whale', 'Jellyfish', 'Starfish'],
'ID': [1, 2, 3, 4],
'Population': [100, 200, np.nan, pd.NaT],
'Regions': [1, None, pd.NaT, pd.NaT]
}

df1 = pd.DataFrame(d1)
print(df1)

This code will print out the DataFrame:

OutputName ID Population Regions
0 Shark 1 100 1
1 Whale 2 200 None
2 Jellyfish 3 NaN NaT
3 Starfish 4 NaT NaT

Then add a second DataFrame with additional rows and columns with NA values:

d2 = {
'Name': ['Shark', 'Whale', 'Jellyfish', 'Starfish', pd.NaT],
'ID': [1, 2, 3, 4, pd.NaT],
'Population': [100, 200, np.nan, pd.NaT, pd.NaT],
'Regions': [1, None, pd.NaT, pd.NaT, pd.NaT],
'Endangered': [pd.NaT, pd.NaT, pd.NaT, pd.NaT, pd.NaT]
}

df2 = pd.DataFrame(d2)
print(df2)

This will output a new DataFrame:

OutputName ID Population Regions Endangered
0 Shark 1 100 1 NaT
1 Whale 2 200 None NaT
2 Jellyfish 3 NaN NaT NaT
3 Starfish 4 NaT NaT NaT
4 NaT NaT NaT NaT NaT

You will use the preceding DataFrames in the examples that follow.

Dropping All Rows with Missing Values

Use dropna() to remove rows with any None, NaN, or NaT values:

dropnaExample.py

dfresult = df1.dropna()
print(dfresult)

This will output:

OutputName ID Population Regions
0 Shark 1 100 1

A new DataFrame with a single row that didn’t contain any NA values.

Dropping All Columns with Missing Values

Use dropna() with axis=1 to remove columns with any None, NaN, or NaT values:

dfresult = df1.dropna(axis=1)
print(dfresult)

The columns with any None, NaN, or NaT values will be dropped:

OutputName ID
0 Shark 1
1 Whale 2
2 Jellyfish 3
3 Starfish 4

A new DataFrame with a single column that contained non-NA values.

Dropping Rows or Columns if `all` the Values are `Null` with `how`

Use the second DataFrame and how:

dropnaExample.py

dfresult = df2.dropna(how='all')
print(dfresult)

The rows with all values equal to NA will be dropped:

OutputName ID Population Regions Endangered
0 Shark 1 100 1 NaT
1 Whale 2 200 None NaT
2 Jellyfish 3 NaN NaT NaT
3 Starfish 4 NaT NaT NaT

The fifth row was dropped.

Next, use how and specify the axis:

dropnaExample.py

dfresult = df2.dropna(how='all', axis=1)
print(dfresult)

The columns with all values equal to NA will be dropped:

OutputName ID Population Regions
0 Shark 1 100 1
1 Whale 2 200 None
2 Jellyfish 3 NaN NaT
3 Starfish 4 NaT NaT
4 NaT NaT NaT NaT

The fifth column was dropped.

Dropping Rows or Columns if a Threshold is Crossed with `thresh`

Use the second DataFrame with thresh to drop rows that do not meet the threshold of at least 3 non-NA values:

dropnaExample.py

dfresult = df2.dropna(thresh=3)
print(dfresult)

The rows do not have at least 3 non-NA will be dropped:

OutputName ID Population Regions Endangered
0 Shark 1 100 1 NaT
1 Whale 2 200 None NaT

The third, fourth, and fifth rows were dropped.

Dropping Rows or Columns for Specific `subsets`

Use the second DataFrame with subset to drop rows with NA values in the Population column:

dropnaExample.py

dfresult = df2.dropna(subset=['Population'])
print(dfresult)

The rows that have Population with NA values will be dropped:

OutputName ID Population Regions Endangered
0 Shark 1 100 1 NaT
1 Whale 2 200 None NaT

The third, fourth, and fifth rows were dropped.

You can also specify the index values in the subset when dropping columns from the DataFrame:

dropnaExample.py

dfresult = df2.dropna(subset=[1, 2], axis=1)
print(dfresult)

The columns that contain NA values in subset of rows 1 and 2:

OutputName ID
0 Shark 1
1 Whale 2
2 Jellyfish 3
3 Starfish 4
4 NaT NaT

The third, fourth, and fifth columns were dropped.

Changing the source DataFrame after Dropping Rows or Columns with `inplace`

By default, dropna() does not modify the source DataFrame. However, in some cases, you may wish to save memory when working with a large source DataFrame by using inplace.

dropnaExample.py

df1.dropna(inplace=True)
print(df1)

This code does not use a dfresult variable.

This will output:

OutputName ID Population Regions
0 Shark 1 100 1

The original DataFrame has been modified.

Conclusion

In this article, you used the dropna() function to remove rows and columns with NA values.

Continue your learning with more Python and pandas tutorials - Python pandas Module Tutorial, pandas Drop Duplicate Rows.

References

pandas DataFrame dropna() API Doc

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about us

About the authors

Pankaj

author

Bradley Kouchi

editor

Still looking for an answer?

Ask a question Search for more help

Was this helpful?

JournalDev

DigitalOcean Employee

• May 11, 2021

Thank u bro, well explained in very simple way

- KHAJA MOINUDDIN KHAN

JournalDev

DigitalOcean Employee

• February 12, 2021

thats very comprehensive. out of all drop explanation … this is the best thank you

- johny

This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 4.0 International License.

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Tutorial

How To Use Python pandas dropna() to Drop NA Values from DataFrame

Introduction

Syntax

Constructing Sample DataFrames

Dropping All Rows with Missing Values

Dropping All Columns with Missing Values

Dropping Rows or Columns if `all` the Values are `Null` with `how`

Dropping Rows or Columns if a Threshold is Crossed with `thresh`

Dropping Rows or Columns for Specific `subsets`

Changing the source DataFrame after Dropping Rows or Columns with `inplace`

Conclusion

Still looking for an answer?

Try DigitalOcean for free

Popular Topics

Join the Tech Talk

Get our biweekly newsletter

Hollie's Hub for Good

Become a contributor

Featured on Community

DigitalOcean Products

Welcome to the developer cloud