Tidyr library in R – 5 IMP functions to know!

Filed Under: R Programming
TIDYR LIBRARY IN R

Hello, readers! In this article, we would be focusing on an important library for data cleansing — Tidyr library in R programming, in detail.

So, let us get started!!


Usage of tidyr library in R

Data cleansing plays a very important role in the process of applying Machine Learning models to a dataset for predictions. In R programming, this purpose is served by tidyr library.

The tidyr library helps us to assemble the data in a simple and clean form. It can be considered as a form of creating and storing data in a simplified format. This in turn reduces the overhead of analyzing and simplifying the data prior to modelling.

In our examples used for explanation, we will be making use of the below dataset.

Have a look!

Dataset-Bike
Dataset-Bike

1. The fill() function

The fill() function of the tidyr package enables us to replace or impute the missing values of a specific column. The NULL values of the passed column gets replaced by the previous entry of the column.

Example:

In the below example, we have replaced the NULL values of the column ‘holiday’. Thus, the NA values get replaced by the previous entry present i.e. ‘0’.

bike_data %>% fill(holiday)

Output:

Tidyr-fill() function
Tidyr-fill() function

2. The replace_na() function

Unlike fill() function, the replace_na() function replaces the NULL values of the multiple columns to some specific user defined values.

Example:

In the below example, we have replaced the NULL values of the below columns:

  • yr -> 0
  • holiday -> ‘unknown’
  • workingday -> 1
  • mnth -> ’12’

Example:

bike_data %>% replace_na(list(yr=0,holiday="unknown",workingday=1,mnth="12"))

Output:

Tidyr replace_na() function
Tidyr replace_na() function

3. The drop_na() function

Using drop_na() function, we can altogether drop/delete the values which contains NULL values. That is, with drop_na() function, all the rows get deleted which encounters a NULL value.

Example:

bike_data = drop_na(bike_data)
print(bike_data)

Output:

Tidyr drop_na() function
Tidyr drop_na() function

4. The gather() function

The gather() function accepts multiple columns as parameter and widens the entire dataset. It converts the values from the columns into key-value pairs.

Example:

bike_data %>%  
  gather(day_type, day, 
         weekday:workingday) 

In the above example, we have passed ‘weekday, workingday’ as parameters and have assumed ‘day_type’ and ‘day’ as keys for which the passed column values would act as a value pair.

Output:

Tidyr gather() function
Tidyr gather() function

5. The nest() function

The nest() function behaves like a summarization function. It summarizes the entire dataset with all the data variables and creates a list of data frames containing all the nested values.

Example:

bike_data %>% nest(data = c(weathersit))

Here, we have nested and grouped the entire dataset by the column ‘weathersit’.

Output:

Tidyr nest() function
Tidyr nest() function

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

For more posts related to R, stay tuned and till then, Happy Learning!! 馃檪


References

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content