Python pandas are very helpful for data manipulation and analysis. But, before we dive into analysis, we need to play with data to get some insights going forward. There are many functions that help us in data manipulation such as groupby, crosstab, and filter. Today, in this article, We will be focusing on data filtering using pandas in python.Â
What is Data Filtering?
In simple words, data filtering is choosing or extracting the subsets of the data for the analysis. There are multiple ways that you can filter the data from a given dataframe.
In this article, we will be focusing on 5 important data filtering functions.
- Filter()
- Boolean indexing
- Query()
- Str.contains()
These are 5 major functions using which you can filter the data as per your requirements. Let’s discuss each of them in the following sections.
1. Filter() function
The filter function may cause you some confusion if you are a beginner. It only filters the column labels. To understand this let’s see how the filter function works.
We will be working on the titanic data in this. For your reference, here is the data we are working on.

Let’s start with filtering the Sex column in the data.
#filter
data.filter(['Sex'])

If you want to know about a particular value in the column i.e. sex in our case, we can use Boolean indexing for the same.
2. Boolean Indexing
Boolean indexing
is one of the useful data filtering methods where we can see if a particular value is in the data or not. To understand this, let’s see how Boolean indexing works.
#boolean
data['Sex'] == 'male'

Here, Boolean indexing will tell us if ‘male’ is there in the row or not. It will return the values in Boolean (True / False). Here we asked for ‘male’ in the data and it returned True for the presence of ‘male’.
You can even pass the series to the data[] selector to get a dataframe with specific values.
#Selector
data[data['Sex'] == 'male']

You can observe the ‘Sex’ column which has only ‘male‘ values.
3. Querying
You can call query function in a simpler or direct way than Boolean. It eliminates the need for selector data[]. Let’s see how it works.
#query
data.query("Sex == 'female'")

I hope now it makes sense. This is how query works in python. It is more simple and direct method for filtering. You can use it in place of Boolean indexing.
4. Str.contains
There will be times where the values read long. So it will be hard to remember the full names of the values. Mostly it will happen with names. In this case, we can use str.contains the function to filter the data out.
#string
data[data.Sex.str.contains("fem")]

You can observe that with the help ‘fem’ characters the function is able to return all the related rows. How cool it is!
I use this method a lot when I am required to filter the data. It is very easy and you have to just give a hint to the function about what do you want and it is so smart to return you the things accurately.
Wrapping Up – Data Filtering
Data filtering is one of the most useful and important aspects of data manipulation and analysis. Without dealing with huge chunks of data, you can filter out small subsets and look for key insights. I have shown multiple methods for data filtering in python. Let me know which is your go-to method for filtering data.
That’s all for now. Happy Python!!!
More read: Pandas data filtering