Hey, readers! Today in our series of R programming, we would be having a look at one of the most extensively used packages — Dplyr library in R, in detail.
So, let us begin!! 🙂
Table of Contents
Usage of dplyr library in R
Dplyr library in R is extensively used for easy and crisp data manipulation prior to modeling. By this, we mean to say that, it offers us with variety of functions which enables us to perform changes and cleaning of data at ease.
It assists us with simple ‘verb’ functions that lead us to the path where we translate our thoughts in the form of code easily. Moreover, the backend used is very efficient which increases the efficiency of those functions when used.
In this article, we would be making use of the below example to work upon and perform manipulations.
In order to utilize the functions provided by dplyr library, we need to install the package and then load it into the R environment as shown–
library(dplyr) Removed all the existing objects rm(list = ls()) t #Setting the working directory setwd("D:/Ediwsor_Project - Bike_Rental_Count/") getwd() #Load the dataset bike_data = read.csv("Bike.csv",header=TRUE)
Having understood about the library, let us now have a look at some of the mostly used functions offered by dplyr library!
Recommended read – Tidyr package in R
1. The filter() function in dplyr library
filter() function alows us to select a subset of rows from the data values. Thus, this can be considered as a row-level function. We need to provide the function with the attributes according to which the subset needs to be extracted.
Here, we have selected all rows which has ‘weathersit’ = 2 and ‘workingday’ = 1.
bike_data %>% filter(weathersit == "2", workingday == "1")
2. The slice() function
As seen above, the filter() function lets us subset the data values according to the rows with respect to the attribute condition. On the other side, the
slice() function enables us to subset the rows based on the index values.
bike_data %>% slice(1:3)
Here, we have selected all the column values for the first 3 rows (1:3) only.
3. The select() function in the dplyr library
Unlike filter() and slice() function, the
select() function performs column-wise operations. It allows us to subset the data frame based on the column names provided as arguments.
Here, we have selected all the columns from ‘instant’ till ‘season’. As a result, all the rows of these 3 columns would be printed.
bike_data %>% select(instant:season)
instant dteday season 1 1 01-01-2011 1 2 2 02-01-2011 1 3 3 03-01-2011 1 4 4 04-01-2011 1 5 5 05-01-2011 1 6 6 06-01-2011 1 7 7 07-01-2011 1 8 8 08-01-2011 1 9 9 09-01-2011 1 10 10 10-01-2011 1 11 11 11-01-2011 1 12 12 12-01-2011 1 13 13 13-01-2011 1 14 14 14-01-2011 1 15 15 15-01-2011 1 16 16 16-01-2011 1 17 17 17-01-2011 1 18 18 18-01-2011 1 19 19 19-01-2011 1 20 20 20-01-2011 1
4. The mutate() function
mutate() function, we can add a new column(based upon some arithmetic operation) to the existing data frame.
In the below example, we have assigned the value of weekday*10 to a new column ‘cnt’ and that gets added to the existing data frame.
bike_data %>% mutate(cnt = weekday * 10)
5. The summarize() function in the dplyr package
summarize() function shrinks the data frame to a single row value depending upon certain conditions passed to it.
In the below example, we have calculated the mean of the column ‘weekday’, and have set the resultant value to a new column ‘avg’.
bike_data %>% summarise(avg = mean(weekday))
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
For more such posts related to R programming, stay tuned!!
Till then, Happy Learning!! 🙂