Table of Contents
1. Pandas drop() Function Syntax
Pandas DataFrame drop() function allows us to delete columns and rows. The drop() function syntax is:
drop(
self,
labels=None,
axis=0,
index=None,
columns=None,
level=None,
inplace=False,
errors="raise"
)
- labels: The labels to remove from the DataFrame. It’s used with ‘axis’ to identify rows or column names.
- axis: The possible values are {0 or ‘index’, 1 or ‘columns’}, default 0. It’s used with ‘labels’ to specify rows or columns.
- index: indexes to drop from the DataFrame.
- columns: columns to drop from the DataFrame.
- level: used to specify the level incase of MultiIndex DataFrame.
- inplace: if True, the source DataFrame is changed and None is returned. The default value is False, the source DataFrame remains unchanged and a new DataFrame object is returned.
- errors: the possible values are {‘ignore’, ‘raise’}, default ‘raise’. If the DataFrame doesn’t have the specified label, KeyError is raised. If we specify errors as ‘ignore’, the error is suppressed and only existing labels are removed.
Let’s look into some of the examples of using the Pandas DataFrame drop() function.
2. Pandas Drop Columns
We can drop a single column as well as multiple columns from the DataFrame.
2.1) Drop Single Column
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'David'], 'ID': [1, 2, 3], 'Role': ['CEO', 'CTO', 'Editor']}
source_df = pd.DataFrame(d1)
print(source_df)
# drop single column
result_df = source_df.drop(columns='ID')
print(result_df)
Output:
Name ID Role
0 Pankaj 1 CEO
1 Meghna 2 CTO
2 David 3 Editor
Name Role
0 Pankaj CEO
1 Meghna CTO
2 David Editor
2.2) Drop Multiple Columns
result_df = source_df.drop(columns=['ID', 'Role'])
print(result_df)
Output:
Name
0 Pankaj
1 Meghna
2 David
3. Pandas Drop Rows
Let’s look into some examples to drop a single row and multiple rows from the DataFrame object.
3.1) Drop Single Row
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'David'], 'ID': [1, 2, 3], 'Role': ['CEO', 'CTO', 'Editor']}
source_df = pd.DataFrame(d1)
result_df = source_df.drop(index=0)
print(result_df)
Output:
Name ID Role
1 Meghna 2 CTO
2 David 3 Editor
3.2) Drop Multiple Rows
result_df = source_df.drop(index=[1, 2])
print(result_df)
Output:
Name ID Role
0 Pankaj 1 CEO
4. Drop DataFrame Columns and Rows in place
We can specify inplace=True
to drop columns and rows from the source DataFrame itself. In this case, None is returned from the drop() function call.
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'David'], 'ID': [1, 2, 3], 'Role': ['CEO', 'CTO', 'Editor']}
source_df = pd.DataFrame(d1)
source_df.drop(columns=['ID'], index=[0], inplace=True)
print(source_df)
Output:
Name Role
1 Meghna CTO
2 David Editor
5. Using labels and axis to drop columns and rows
It’s not the recommended approach to delete rows and columns. But, it’s good to know because the ‘index’ and ‘columns’ parameters were introduced to drop() function in pandas version 0.21.0. So you may encounter it for older code.
import pandas as pd
d1 = {'Name': ['Pankaj', 'Meghna', 'David'], 'ID': [1, 2, 3], 'Role': ['CEO', 'CTO', 'Editor']}
source_df = pd.DataFrame(d1)
# drop rows
result_df = source_df.drop(labels=[0, 1], axis=0)
print(result_df)
# drop columns
result_df = source_df.drop(labels=['ID', 'Role'], axis=1)
print(result_df)
Output:
Name ID Role
2 David 3 Editor
Name
0 Pankaj
1 Meghna
2 David
6. Suppressing Errors in Dropping Columns and Rows
If the DataFrame doesn’t contain the given labels, KeyError is raised.
result_df = source_df.drop(columns=['XYZ'])
Output:
KeyError: "['XYZ'] not found in axis"
We can suppress this error by specifying errors='ignore'
in the drop() function call.
result_df = source_df.drop(columns=['XYZ'], errors='ignore')
print(result_df)
Output:
Name ID Role
0 Pankaj 1 CEO
1 Meghna 2 CTO
2 David 3 Editor
7. Conclusion
Pandas DataFrame drop() is a very useful function to drop unwanted columns and rows. There are two more functions that extends the drop() functionality.
- drop_duplicates() to remove duplicate rows
- dropna() to remove rows and columns with missing values