Dplyr library in R – 5 IMP functions to know!

Filed Under: R Programming
DPLYR LIBRARY IN R

Hey, readers! Today in our series of R programming, we would be having a look at one of the most extensively used packages — Dplyr library in R, in detail.

So, let us begin!! 馃檪


Usage of dplyr library in R

The Dplyr library in R is extensively used for easy and crisp data manipulation prior to modeling. By this, we mean to say that, it offers us with variety of functions which enables us to perform changes and cleaning of data at ease.

It assists us with simple ‘verb’ functions that lead us to the path where we translate our thoughts in the form of code easily. Moreover, the backend used is very efficient which increases the efficiency of those functions when used.

In this article, we would be making use of the below example to work upon and perform manipulations.

In order to utilize the functions provided by dplyr library, we need to install the package and then load it into the R environment as shown–

install.packages('dplyr')

Example:

library(dplyr)

Removed all the existing objects
rm(list = ls())
t
#Setting the working directory
setwd("D:/Ediwsor_Project - Bike_Rental_Count/")
getwd()

#Load the dataset
bike_data = read.csv("Bike.csv",header=TRUE)

Dataset:

Dataset--Bike
Dataset–Bike

Having understood about the library, let us now have a look at some of the mostly used functions offered by dplyr library!

Recommended read – Tidyr package in R


1. The filter() function in dplyr library

Dplyr’s filter() function alows us to select a subset of rows from the data values. Thus, this can be considered as a row-level function. We need to provide the function with the attributes according to which the subset needs to be extracted.

Here, we have selected all rows which has ‘weathersit’ = 2 and ‘workingday’ = 1.

Example:

bike_data %>% filter(weathersit == "2", workingday == "1")

Output:

Dplyr library in R filter() method
Dplyr filter() method

2. The slice() function

As seen above, the filter() function lets us subset the data values according to the rows with respect to the attribute condition. On the other side, the slice() function enables us to subset the rows based on the index values.

Example:

bike_data %>% slice(1:3)

Here, we have selected all the column values for the first 3 rows (1:3) only.

Output:

Dplyr slice() method
Dplyr slice() method

3. The select() function in the dplyr library

Unlike filter() and slice() function, the select() function performs column-wise operations. It allows us to subset the data frame based on the column names provided as arguments.

Here, we have selected all the columns from ‘instant’ till ‘season’. As a result, all the rows of these 3 columns would be printed.

Example:

bike_data %>% select(instant:season)

Output:

      instant dteday     season
1        1 01-01-2011      1
2        2 02-01-2011      1
3        3 03-01-2011      1
4        4 04-01-2011      1
5        5 05-01-2011      1
6        6 06-01-2011      1
7        7 07-01-2011      1
8        8 08-01-2011      1
9        9 09-01-2011      1
10      10 10-01-2011      1
11      11 11-01-2011      1
12      12 12-01-2011      1
13      13 13-01-2011      1
14      14 14-01-2011      1
15      15 15-01-2011      1
16      16 16-01-2011      1
17      17 17-01-2011      1
18      18 18-01-2011      1
19      19 19-01-2011      1
20      20 20-01-2011      1

4. The mutate() function

Using mutate() function, we can add a new column(based upon some arithmetic operation) to the existing data frame.

In the below example, we have assigned the value of weekday*10 to a new column ‘cnt’ and that gets added to the existing data frame.

Example:

bike_data %>% mutate(cnt = weekday * 10)

Output:

Dplyr library in R mutate() method
Dplyr mutate() method

5. The summarize() function in the dplyr package

The summarize() function shrinks the data frame to a single row value depending upon certain conditions passed to it.

In the below example, we have calculated the mean of the column ‘weekday’, and have set the resultant value to a new column ‘avg’.

Example:

bike_data %>% summarise(avg = mean(weekday))

Output:

  avg
  2.9

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more such posts related to R programming, stay tuned!!

Till then, Happy Learning!! 馃檪


References

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content