4 Easy Ways For Data Filtering In Python Pandas

Filed Under: Pandas
Data Filtering Using Pandas In Python

Python pandas are very helpful for data manipulation and analysis. But, before we dive into analysis, we need to play with data to get some insights going forward. There are many functions that help us in data manipulation such as groupby, crosstab, and filter. Today, in this article, We will be focusing on data filtering using pandas in python. 


What is Data Filtering?

In simple words, data filtering is choosing or extracting the subsets of the data for the analysis. There are multiple ways that you can filter the data from a given dataframe.

In this article, we will be focusing on 5 important data filtering functions.

  • Filter()
  • Boolean indexing
  • Query()
  • Str.contains()

These are 5 major functions using which you can filter the data as per your requirements. Let’s discuss each of them in the following sections.


1. Filter() function

The filter function may cause you some confusion if you are a beginner. It only filters the column labels. To understand this let’s see how the filter function works.

We will be working on the titanic data in this. For your reference, here is the data we are working on.

Titanic

Let’s start with filtering the Sex column in the data.

#filter

data.filter(['Sex'])
Filter 1

If you want to know about a particular value in the column i.e. sex in our case, we can use Boolean indexing for the same.


2. Boolean Indexing

Boolean indexing is one of the useful data filtering methods where we can see if a particular value is in the data or not. To understand this, let’s see how Boolean indexing works.

#boolean

data['Sex'] == 'male'
Boolean

Here, Boolean indexing will tell us if ‘male’ is there in the row or not. It will return the values in Boolean (True / False). Here we asked for ‘male’ in the data and it returned True for the presence of ‘male’.

You can even pass the series to the data[] selector to get a dataframe with specific values.

#Selector

data[data['Sex'] == 'male']
Data filtering

You can observe the ‘Sex’ column which has only ‘male‘ values.


3. Querying

You can call query function in a simpler or direct way than Boolean. It eliminates the need for selector data[]. Let’s see how it works.

#query

data.query("Sex == 'female'")
Data filtering

I hope now it makes sense. This is how query works in python. It is more simple and direct method for filtering. You can use it in place of Boolean indexing.


4. Str.contains

There will be times where the values read long. So it will be hard to remember the full names of the values. Mostly it will happen with names. In this case, we can use str.contains the function to filter the data out.

#string

data[data.Sex.str.contains("fem")]
Data filtering

You can observe that with the help ‘fem’ characters the function is able to return all the related rows. How cool it is!

I use this method a lot when I am required to filter the data. It is very easy and you have to just give a hint to the function about what do you want and it is so smart to return you the things accurately.


Wrapping Up – Data Filtering

Data filtering is one of the most useful and important aspects of data manipulation and analysis. Without dealing with huge chunks of data, you can filter out small subsets and look for key insights. I have shown multiple methods for data filtering in python. Let me know which is your go-to method for filtering data.

That’s all for now. Happy Python!!!

More read: Pandas data filtering

close
Generic selectors
Exact matches only
Search in title
Search in content
Post Type Selectors