Hello, readers! In this article, we will be focusing on **R group_by() function** in detail.

So, let us begin!!

Table of Contents

- 1 Usage of R group_by() function
- 2 Example 1: Grouping across a single column using group_by() function
- 3 Example 2: R group_by() with summarize() alongside n() function
- 4 Example 3: Grouping across multiple columns using group_by() function
- 5 Example 4: R group_by() with mutate() function
- 6 Conclusion
- 7 References

## Usage of R group_by() function

While dealing with datasets, we usually find the dataset in the form of table as a combination of rows and columns. Now, in the domain of data science and analytics, we often come across situation wherein we need to analyze and understand the data in terms of their combinations as well.

For example, consider a dataset which contains marks of students with various factors such as subject, special groups of subjects, extra curricular activities, etc. In such scenario, it is beneficial for us to have a provision wherein we can group the marks against the factors mentioned above.

This is when the R group_by() function comes into picture!

The `group_by() function`

groups the existing tabular value against some specific variables or factors of the table. By this, we get the values that are enclosed and dependent only on the mentioned factors chosen.

R **dplyr library** provides us with the group_by() function to work with the data.

**Syntax:**

data_object %>% group_by(column_names)

Now, let us have a look at the implementation of the same!

## Example 1: Grouping across a single column using group_by() function

In this example, we have created a list of 20 numbers and have created a categorical variables ‘Poll’ using rep() function with values as ‘Yes’ and ‘No’ and ‘S’ with values ‘r’ and ‘n’.

Further, we have created a table of these columns using the `tibble()`

function. After which, we have grouped the values against the ‘Poll’ variable as shown below!

**Example:**

#Removed all the existing objects rm(list = ls()) lst <- c(1:20) Poll <- rep(c("Yes", "No"), 10) # rep stands for replicate S = rep(c("r","n"),10) #install.packages('tibble') library('tibble') dta = tibble(lst,Poll,S) #print(dta) library('dplyr') dta %>% group_by(Poll)

**Output:**

# A tibble: 20 x 3 # Groups: Poll [2] lst Poll S <int> <chr> <chr> 1 1 Yes r 2 2 No n 3 3 Yes r 4 4 No n 5 5 Yes r 6 6 No n 7 7 Yes r 8 8 No n 9 9 Yes r 10 10 No n 11 11 Yes r 12 12 No n 13 13 Yes r 14 14 No n 15 15 Yes r 16 16 No n 17 17 Yes r 18 18 No n 19 19 Yes r 20 20 No n

## Example 2: R group_by() with summarize() alongside n() function

In the below example, we have clubbed the group_by() function with the summarize() function. Within the `summarize()`

function, we have passed `n()`

which works as the total count of values. And, finally, we have grouped them across the ‘Poll’ variable.

**Example:**

#Removed all the existing objects rm(list = ls()) lst <- c(1:20) Poll <- rep(c("Yes", "No"), 10) # rep stands for replicate S = rep(c("r","n"),10) #install.packages('tibble') library('tibble') dta = tibble(lst,Poll,S) #print(dta) library('dplyr') dta %>% group_by(Poll) %>% summarize(n = n())

**Output:**

# A tibble: 2 x 2 Poll n * <chr> <int> 1 No 10 2 Yes 10

## Example 3: Grouping across multiple columns using group_by() function

In this example, we have grouped the table against the columns ‘Poll’ and ‘S’, respectively. Further, we have summarized the values across the total count using summarize() function.

**Example:**

dta %>% group_by(Poll,S) %>% summarize(n = n())

**Output:**

# A tibble: 2 x 3 # Groups: Poll [2] Poll S n <chr> <chr> <int> 1 No n 10 2 Yes r 10

## Example 4: R group_by() with mutate() function

Here, we have grouped the values across columns ‘Poll’ and ‘S’. Further, we have used `mutate()`

function to get it according to the mean of the ‘lst’ column using `mean()`

function

**Example:**

#Removed all the existing objects rm(list = ls()) lst <- c(1:20) Poll <- rep(c("Yes", "No"), 10) # rep stands for replicate S = rep(c("r","n"),10) #install.packages('tibble') library('tibble') dta = tibble(lst,Poll,S) #print(dta) library('dplyr') dta %>% group_by(Poll,S) %>% mutate(res = mean(lst))

**Output:**

# A tibble: 20 x 4 # Groups: Poll, S [2] lst Poll S res <int> <chr> <chr> <dbl> 1 1 Yes r 10 2 2 No n 11 3 3 Yes r 10 4 4 No n 11 5 5 Yes r 10 6 6 No n 11 7 7 Yes r 10 8 8 No n 11 9 9 Yes r 10 10 10 No n 11 11 11 Yes r 10 12 12 No n 11 13 13 Yes r 10 14 14 No n 11 15 15 Yes r 10 16 16 No n 11 17 17 Yes r 10 18 18 No n 11 19 19 Yes r 10 20 20 No n 11

## Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to R programming, stay tuned with us!

Till then, Happy Learning!! 🙂