Tutorial

Fill Missing Values In R using Tidyr, Fill Function

Published on August 3, 2022
Default avatar

By Prajwal CN

Fill Missing Values In R using Tidyr, Fill Function

While we believe that this content benefits our community, we have not yet thoroughly reviewed it. If you have any suggestions for improvements, please let us know by clicking the “report an issue“ button at the bottom of the tutorial.

Missing data or values occurs when the data record is absent in the variable. This will cause serious issues in the data modeling process if not treated properly. Above all, most of the algorithms are not comfortable with missing data.

There are many ways to handle missing data in R. You can drop those records. But, keep in mind that you are dropping information when you do so and may lose a potential edge in modeling. On the other hand, you can impute the missing data with the mean and median of the data. In this article, we will be looking at filling Missing Values in R using the Tidyr package.

Tidyr is a R package which offers many functions to assist you in tidy the data. Greater the data quality, Better the model!


1. Missing Data in R

  • Missing values can be denoted by many forms - NA, NAN and more.
  • It is a missing record in the variable. It can be a single value or an entire row.
  • Missing values can occur both in numerical and categorical data.
  • R offers many methods to deal with missing data
  • Tidyr package helps in filling missing data using the Top down or bottom up approach.

2. Tidyr Package in R

  • The Tidyr package in R is used to clean the raw data in R.
  • If offers functions for cleaning, organizing, filling missing values and more.
  • We will be using tidyr with R pipes.

To install the Tidyr package in R, run the below code in R.

#Install tidyr package

install.packages('tidyr')


#Load the library

library(tidyr)

package ‘tidyr’ successfully unpacked and MD5 sums checked

You will get the confirmation message after successful loading of the tidyr as shown above.


3. Create a Dataframe

Yes, we have to create a simple sample data frame that has missing values. This will help us in using the fill function of tidyr to fill the missing data.

#Create a dataframe

a <- c('A','B','C','D','E','F','G','H','I','J')
b <- c('Roger','Carlo','Durn','Jessy','Mounica','Rack','Rony','Saly','Kelly','Joseph')
c <- c(86,NA,NA,NA,88,NA,NA,86,NA,NA)

df <- data.frame(a,b,c)
df
   a       b  c
1  A   Roger 86
2  B   Carlo NA
3  C    Durn NA
4  D   Jessy NA
5  E Mounica 88
6  F    Rack NA
7  G    Rony NA
8  H    Saly 86
9  I   Kelly NA
10 J  Joseph NA

Well, we got our data frame but with a lot of missing values. So, in these cases where your data has more and more missing values, you can make use of the fill function in R to fill the corresponding values/neighbor values in place of missing data.


4. Two Different Approaches

Yes, you can fill in the data as I said earlier. This process includes two approaches -

  • Up - While filling the missing values, you have to specify the direction of filling of values. If you choose Up, then the filling process will be bottom-up.
  • Down - In this method, you have to set the direction of filling to down.

Didn’t get it?

Don’t worry. We will be going through some examples to illustrate the same and you will get to know how things work.


5. Filling Missing Values - ‘Up’

In this process, we have a data frame with 3 columns and 10 data records in it. Before using the fill function to handle the missing data, you have to make sure of some things -

Sometimes when the data is collected, people may enter 1 value as a representation of some values, because they were the same.

Ex: When collecting the age, if there were 10 people whose age is 25, you can mention 25 against the last person indicating that all 10 people’s age is 25.

Please note that it is not the most common situation you face. But, the intention of this is to make sure, when you are in this kind of space, you can use the fill function to deal with this.

#Dataframe

   a       b  c
1  A   Roger 86
2  B   Carlo NA
3  C    Durn NA
4  D   Jessy NA
5  E Mounica 88
6  F    Rack NA
7  G    Rony NA
8  H    Saly 86
9  I   Kelly NA
10 J  Joseph NA


#Creste new dataframe by filling missing values (Up)
df1 <- df %>% fill(c, .direction = 'up')
df1
   a       b  c
1  A   Roger 86
2  B   Carlo 88
3  C    Durn 88
4  D   Jessy 88
5  E Mounica 88
6  F    Rack 86
7  G    Rony 86
8  H    Saly 86
9  I   Kelly NA
10 J  Joseph NA

You can observe that, the fill function filled the missing values using UP direction (Bottom - Up).

  • You can see that there are 2 NA values in the last rows. This is because the fill function first encounters the NA value and fills it to the next NA value as the direction is UP.

6. Filling Missing Values - ‘Down’

Well, here we will be using the ‘Down’ method to fill the missing values in the data. Always make sure of some assumptions which I have mentioned in the earlier section to understand what you are doing and what will be the outcome.

#Data


   a       b  c
1  A   Roger 86
2  B   Carlo NA
3  C    Durn NA
4  D   Jessy NA
5  E Mounica 88
6  F    Rack NA
7  G    Rony NA
8  H    Saly 86
9  I   Kelly NA
10 J  Joseph NA


#Creates new dataframe by filling missing values (Down) - (Top-Down approach)

df1 <- df %>% fill(c, .direction = 'down')
df1
   a       b  c
1  A   Roger 86
2  B   Carlo 86
3  C    Durn 86
4  D   Jessy 86
5  E Mounica 88
6  F    Rack 88
7  G    Rony 88
8  H    Saly 86
9  I   Kelly 86
10 J  Joseph 86
  • Here, there are no missing values. This is because the fill function first encounters valid data values which are 86. It will fill the 86 into the next NA regions until it finds a valid data record.

7. Wrapping Up

Filling Missing values in R is the most important process when you are analyzing any data which has null values. Things may seem a bit hard for you, but make sure you through the article once or twice to understand it concisely. It’s not a hard cake to digest!.

I hope this method will come to your assistance in your future assignments. That’s all for now. Happy R!!! :)

More read: Fill function in R

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about us


About the authors
Default avatar
Prajwal CN

author

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Get our biweekly newsletter

Sign up for Infrastructure as a Newsletter.

Hollie's Hub for Good

Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

Become a contributor

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

Welcome to the developer cloud

DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

Learn more
DigitalOcean Cloud Control Panel