Pandas cut() Function Examples

Filed Under: Python
Pandas Cut Function

1. Pandas cut() Function

Pandas cut() function is used to segregate array elements into separate bins. The cut() function works only on one-dimensional array-like objects.

2. Usage of Pandas cut() Function

The cut() function is useful when we have a large number of scalar data and we want to perform some statistical analysis on it.

For example, let’s say we have an array of numbers between 1 and 20. We want to divide them into two bins of (1, 10] and (10, 20] and add labels such as “Lows” and “Highs”. We can easily perform this using the pandas cut() function.

Furthermore, we can perform functions on the elements of a specific bin and label elements.

3. Pandas cut() function syntax

The cut() function sytax is:


cut(
    x,
    bins,
    right=True,
    labels=None,
    retbins=False,
    precision=3,
    include_lowest=False,
    duplicates="raise",
)
  • x is the input array to be binned. It must be one-dimensional.
  • bins defines the bin edges for the segmentation.
  • right indicates whether to include the rightmost edge or not, default value is True.
  • labels is used to specify the labels for the returned bins.
  • retbins specifies whether to return the bins or not.
  • precision specifies the precision at which to store and display the bins labels.
  • include_lowest specifies whether the first interval should be left-inclusive or not.
  • duplicates speicifies what to do if the bins edges are not unique, whether to raise ValueError or drop non-uniques.

4. Pandas cut() function examples

Let’s look into some examples of pandas cut() function. I will use NumPy to generate random numbers to populate the DataFrame object.

4.1) Segment Numbers into Bins


import pandas as pd
import numpy as np

df_nums = pd.DataFrame({'num': np.random.randint(1, 100, 10)})
print(df_nums)

df_nums['num_bins'] = pd.cut(x=df_nums['num'], bins=[1, 25, 50, 75, 100])
print(df_nums)

print(df_nums['num_bins'].unique())

Output:


   num
0   80
1   40
2   25
3    9
4   66
5   13
6   63
7   33
8   20
9   60

   num   num_bins
0   80  (75, 100]
1   40   (25, 50]
2   25    (1, 25]
3    9    (1, 25]
4   66   (50, 75]
5   13    (1, 25]
6   63   (50, 75]
7   33   (25, 50]
8   20    (1, 25]
9   60   (50, 75]

[(75, 100], (25, 50], (1, 25], (50, 75]]
Categories (4, interval[int64]): [(1, 25] < (25, 50] < (50, 75] < (75, 100]]

Notice that 25 is part of the bin (1, 25]. It’s because the rightmost edge is included by default. If you don’t want that then pass the right=False parameter to the cut() function.

4.2) Adding Labels to Bins


import pandas as pd
import numpy as np

df_nums = pd.DataFrame({'num': np.random.randint(1, 20, 10)})
print(df_nums)

df_nums['nums_labels'] = pd.cut(x=df_nums['num'], bins=[1, 10, 20], labels=['Lows', 'Highs'], right=False)

print(df_nums)

print(df_nums['nums_labels'].unique())

Since we want 10 to be part of Highs, we are specifying right=False in the cut() function call.

Output:


   num
0    5
1   16
2    6
3   13
4    2
5   10
6   18
7   10
8    2
9   18

   num nums_labels
0    5        Lows
1   16       Highs
2    6        Lows
3   13       Highs
4    2        Lows
5   10       Highs
6   18       Highs
7   10       Highs
8    2        Lows
9   18       Highs

[Lows, Highs]
Categories (2, object): [Lows < Highs]

5. References

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages