Hello, readers! In this article, we will be focusing on the R cut() function, in detail.
So, let us begin!
How to use the R cut() function?
While dealing with data at run time, we often come across situations, wherein we need to segregate or divide the data available in the range of a certain intervals.
This is when
R cut() function comes into picture.
The cut() function enables us to divide the numeric vector into a range of certain intervals in a customized fashion. With cut() function, the values gets divided into a ‘x’ interval from the ‘x’ data values depending upon the breaking interval criteria.
Further, with the help of cut() function, one can convert the numeric vectors to a factor value using label as a parameter.
cut(x, breaks, labels = NULL, include.lowest = FALSE, right = TRUE)
- breaks: The value of break is the actual value of the number of intervals which again needs to be a numeric vector.
- x: the numeric vector which has to be transformed.
- labels: They are the logical labels attached to the result. If labels = FALSE, integer codes will come into use.
When we pass the value of the parameter
'break' = y, the entire interval/range of the numeric vector is divided into y pieces.
Let us now have a look at different variants of the cut() function in the subsequent sections!
1. R cut() with breaks parameter
In the below example, we have passed the numeric vector to the function. Further, we have set the value of ‘breaks’ = 3 which means the entire numeric vector in R would be divided into 3 intervals as shown below.
rm(list = ls()) data = c(1200,34567,3456,12,3456,0985,1211) cut_res = cut(data, 3) cut_res table(cut_res)
As a result, the cut() function returns the range of values with 3 levels. Further, we have made use of the table() function to summarize the range intervals.
table() function returns the 7 values being divided into the 3 intervals.
> cut_res  (-22.6,1.15e+04] (2.3e+04,3.46e+04] (-22.6,1.15e+04] (-22.6,1.15e+04] (-22.6,1.15e+04] (-22.6,1.15e+04]  (-22.6,1.15e+04] Levels: (-22.6,1.15e+04] (1.15e+04,2.3e+04] (2.3e+04,3.46e+04] > table(cut_res) cut_res (-22.6,1.15e+04] (1.15e+04,2.3e+04] (2.3e+04,3.46e+04] 6 0 1
2. R cut() with labels
In order to have a factor as an outcome, we need to provide the necessary factor levels to the function.
We can provide the factor levels that need to be applied to the divided range by passing the factor levels to the
labels parameter within the function.
rm(list = ls()) data = c(1200,34567,3456,12,3456,0985,1211) cut_res = cut(data, 3,labels = c('XXS', 'XS', 'S')) table(cut_res)
As seen below, the function has divided the numeric vector into ‘3’ intervals with the labels as XXS, XS, and S. It is mandatory for the number of breaks and labels to be the same in count.
XXS XS S 6 0 1
3. Using the cut() function to get values between an interval
As discussed above, the breaks parameter can be used to divide the numeric vector into a certain range of intervals. With this, we can also specify a particular range as a value for this parameter.
In the below example, we have divided and fitted the values between the range -5 to +5 as shown.
rm(list = ls()) data = rnorm(200) cut_res = cut(data, breaks=-5:5) table(cut_res)
cut_res (-5,-4] (-4,-3] (-3,-2] (-2,-1] (-1,0] (0,1] (1,2] (2,3] (3,4] (4,5] 0 1 5 32 66 66 29 1 0 0
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question. For more such posts related to R programming, stay tuned with us!
Till then, Happy Learning!! 🙂