Pandas DataFrame loc[] allows us to access a group of rows and columns. We can pass labels as well as boolean values to select the rows and columns.
DataFrame loc[] inputs
Some of the allowed inputs are:
- A Single Label – returning the row as Series object.
- A list of Labels – returns a DataFrame of selected rows.
- A Slice with Labels – returns a Series with the specified rows, including start and stop labels.
- A boolean array – returns a DataFrame for True labels, the length of the array must be the same as the axis being selected.
- A conditional statement or callable function – must return a valid value to select the rows and columns to return.
DataFrame loc[] Examples
Let’s look into some examples of using the loc attribute of the DataFrame object. But, first, we will create a sample DataFrame for us to use.
import pandas as pd
d1 = {'Name': ['John', 'Jane', 'Mary'], 'ID': [1, 2, 3], 'Role': ['CEO', 'CTO', 'CFO']}
df = pd.DataFrame(d1)
print('DataFrame:\n', df)
Output:
DataFrame:
Name ID Role
0 John 1 CEO
1 Jane 2 CTO
2 Mary 3 CFO
1. loc[] with a single label
row_1_series = df.loc[1]
print(type(row_1_series))
print(df.loc[1])
Output:
<class 'pandas.core.series.Series'>
Name Jane
ID 2
Role CTO
Name: 1, dtype: object
2. loc[] with a list of label
row_0_2_df = df.loc[[0, 2]]
print(type(row_0_2_df))
print(row_0_2_df)
Output:
<class 'pandas.core.frame.DataFrame'>
Name ID Role
0 John 1 CEO
2 Mary 3 CFO
3. Getting a Single Value
We can specify the row and column labels to get the single value from the DataFrame object.
jane_role = df.loc[1, 'Role']
print(jane_role) # CTO
4. Slice with loc[]
We can pass a slice of labels too, in that case, the start and stop labels will be included in the result Series object.
roles = df.loc[0:1, 'Role']
print(roles)
Output:
0 CEO
1 CTO
Name: Role, dtype: object
5. loc[] with an array of Boolean values
row_1_series = df.loc[[False, True, False]]
print(row_1_series)
Output:
Name ID Role
1 Jane 2 CTO
Since the DataFrame has 3 rows, the array length should be 3. If the argument boolean array length doesn’t match with the length of the axis, IndexError: Item wrong length is raised.
6. loc[] with Conditional Statements
data = df.loc[df['ID'] > 1]
print(data)
Output: A DataFrame of the rows where the ID is greater than 1.
Name ID Role
1 Jane 2 CTO
2 Mary 3 CFO
7. DataFrame loc[] with Callable Function
We can also use a lambda function with the DataFrame loc[] attribute.
id_2_row = df.loc[lambda df1: df1['ID'] == 2]
print(id_2_row)
Output:
Name ID Role
1 Jane 2 CTO
Setting DataFrame Values using loc[] attribute
One of the special features of loc[] is that we can use it to set the DataFrame values. Let’s look at some examples to set DataFrame values using the loc[] attribute.
1. Setting a Single Value
We can specify the row and column labels to set the value of a specific index.
import pandas as pd
d1 = {'Name': ['John', 'Jane', 'Mary'], 'ID': [1, 2, 3], 'Role': ['CEO', 'CTO', 'CFO']}
df = pd.DataFrame(d1, index=['A', 'B', 'C'])
print('Original DataFrame:\n', df)
# set a single value
df.loc['B', 'Role'] = 'Editor'
print('Updated DataFrame:\n', df)
Output:
Original DataFrame:
Name ID Role
A John 1 CEO
B Jane 2 CTO
C Mary 3 CFO
Updated DataFrame:
Name ID Role
A John 1 CEO
B Jane 2 Editor
C Mary 3 CFO
2. Setting values of an entire row
If we specify only a single label, all the values in that row will be set to the specified one.
df.loc['B'] = None
print('Updated DataFrame with None:\n', df)
Output:
Updated DataFrame with None:
Name ID Role
A John 1.0 CEO
B None NaN None
C Mary 3.0 CFO
3. Setting values of an entire column
We can use a slice to select all the rows and specify a column to set its values to the specified one.
df.loc[:, 'Role'] = 'Employee'
print('Updated DataFrame Role to Employee:\n', df)
Output:
Updated DataFrame Role to Employee:
Name ID Role
A John 1.0 Employee
B None NaN Employee
C Mary 3.0 Employee
4. Setting Value based on a Condition
df.loc[df['ID'] == 1, 'Role'] = 'CEO'
print(df)
Output:
Name ID Role
A John 1.0 CEO
B None NaN Employee
C Mary 3.0 Employee
Conclusion
Python DataFrame loc[] attribute is very useful because we can get specific values as well as set the values. The support for conditional parameters and lambda expressions with the loc[] attribute makes it a very powerful resource.