Tutorial

How To Replace Values Using `replace()` and `is.na()` in R

Updated on December 20, 2022
Default avatar

By Prajwal CN

How To Replace Values Using `replace()` and `is.na()` in R

Introduction

In data analysis, you may need to address missing values, negative values, or non-accurate values that are present in the dataset. These problems can be addressed by replacing the values with 0, NA, or the mean.

In this article, you will explore how to use the replace() and is.na() functions in R.

Prerequisites

To complete this tutorial, you will need:

Replacing the Values in a Vector with replace()

This section will show how to replace a value in a vector.

The replace() function in R syntax includes the vector, index vector, and the replacement values:

replace(target, index, replacement)

First, create a vector:

df <- c('apple', 'orange', 'grape', 'banana')
df

This will create a vector with apple, orange, grape, and banana:

Output
"apple" "orange" "grape" "banana"

Now, let’s replace the second item in the list:

dy <- replace(df, 2, 'blueberry')
dy

This will replace orange with blueberry:

Output
"apple" "blueberry" "grape" "banana"

Now, we’ll replace the fourth item in the list:

dx <- replace(dy, 4, 'cranberry')
dx

This will replace banana with cranberry:

Output
"apple" "blueberry" "grape" "cranberry"

Replacing NA Values with 0 in R

Consider a scenario where you have a data frame containing measurements:

air_quality
    Ozone  Solar.R  Wind  Temp  Month  Day
1      41      190   7.4    67      5    1
2      36      118   8.0    72      5    2
3      12      149  12.6    74      5    3
4      18      313  11.5    62      5    4
5      NA       NA  14.3    56      5    5
6      28       NA  14.9    66      5    6
7      23      299   8.6    65      5    7
8      19       99  13.8    59      5    8
9       8       19  20.1    61      5    9
10     NA      194   8.6    69      5   10
11      7       NA   6.9    74      5   11
12     16      256   9.7    69      5   12

Here is the data in CSV format:

air_quality.csv
Ozone,Solar.R,Wind,Temp,Month,Day
41,190,7.4,67,5,1
36,118,8.0,72,5,2
12,149,12.6,74,5,3
18,313,11.5,62,5,4
NA,NA,14.3,56,5,5
28,NA,14.9,66,5,6
23,299,8.6,65,5,7
19,99,13.8,59,5,8
8,19,20.1,61,5,9
NA,194,8.6,69,5,10
7,NA,6.9,74,5,11
16,256,9.7,69,5,12

This contains the string NA for “Not Available” for situations where the data is missing.

You can replace the NA values with 0.

First, define the data frame:

df <- read.csv('air_quality.csv')

Use is.na() to check if a value is NA. Then, replace the NA values with 0:

df[is.na(df)] <- 0
df

The data frame is now:

Output
Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 0 0 14.3 56 5 5 6 28 0 14.9 66 5 6 7 23 299 8.6 65 5 7 8 19 99 13.8 59 5 8 9 8 19 20.1 61 5 9 10 0 194 8.6 69 5 10 11 7 0 6.9 74 5 11 12 16 256 9.7 69 5 12

All occurrences of NA in the data frame have been replaced.

Replacing NA Values with the Mean of the Values in R

In the data analysis process, accuracy is improved in many cases by replacing NA values with a mean value. The mean() function calculates the mean value.

To overcome this situation, the NA values are replaced by the mean of the rest of the values. This method has proven vital in producing good accuracy without any data loss.

Consider the following input data set with NA values:

air_quality
    Ozone  Solar.R  Wind  Temp  Month  Day
1      41      190   7.4    67      5    1
2      36      118   8.0    72      5    2
3      12      149  12.6    74      5    3
4      18      313  11.5    62      5    4
5      NA       NA  14.3    56      5    5
6      28       NA  14.9    66      5    6
7      23      299   8.6    65      5    7
8      19       99  13.8    59      5    8
9       8       19  20.1    61      5    9
10     NA      194   8.6    69      5   10
11      7       NA   6.9    74      5   11
12     16      256   9.7    69      5   12
df <- read.csv('air_quality.csv')

Use is.na() and mean() to replace NA:

df$Ozone[is.na(df$Ozone)] <- mean(df$Ozone, na.rm = TRUE)

First, this code finds all the occurrences of NA in the Ozone column. Next, it calculates the mean of all the values in the Ozone column - excluding the NA values with the na.rm argument. Then each instance of NA is replaced with the calculated mean.

Then round() the values to whole numbers:

df$Ozone <- round(df$Ozone, digits = 0)

The data frame is now:

Output
Ozone Solar.R Wind Temp Month Day 1 41 190 7.4 67 5 1 2 36 118 8.0 72 5 2 3 12 149 12.6 74 5 3 4 18 313 11.5 62 5 4 5 21 NA 14.3 56 5 5 6 28 NA 14.9 66 5 6 7 23 299 8.6 65 5 7 8 19 99 13.8 59 5 8 9 8 19 20.1 61 5 9 10 21 194 8.6 69 5 10 11 7 NA 6.9 74 5 11 12 16 256 9.7 69 5 12

The NA values in the Ozone column are now replaced by the rounded mean of the values in the Ozone column (21).

Replacing the Negative Values with 0 or NA in R

In the data analysis process, sometimes you will want to replace the negative values in the data frame with 0 or NA. This is necessary to avoid the negative tendency of the results. The negative values present in a dataset will mislead the analysis and produce false accuracy.

Consider the following input data set with negative values:

negative_values.csv
    count  entry1  entry2  entry3
 1      1     345    -234     345
 2      2      65     654     867
 3      3      23     345    3456
 4      4      87     876       9
 5      5    2345      34     867
 6      6     876      98      76
 7      7      35    -456     123
 8      8      87      98     345
 9      9    -765      67     765
10     10    4567     -87     234

Here is the data in CSV format:

count,entry1,entry2,entry3
1,345,-234,345
2,65,654,867
3,23,345,3456
4,87,867,9
5,2345,34,867
6,876,98,76
7,35,-456,123
8,87,98,345
9,-765,67,765
10,4567,-87,234

Read the CSV file:

df <- read.csv('negative_values.csv')

Replacing the Negative Values with 0

Use replace() to change the negative values in the entry2 column to 0:

data_zero <- df
data_zero$entry2 <- replace(df$entry2, df$entry2 < 0, 0) 
data_zero

The data frame is now:

Output
count entry1 entry2 entry3 1 1 345 0 345 2 2 65 654 867 3 3 23 345 3456 4 4 87 867 9 5 5 2345 34 867 6 6 876 98 76 7 7 35 0 123 8 8 87 98 345 9 9 -765 67 765 10 10 4567 0 234

The negative values in the entry2 column have been replaced with 0.

Replacing the Negative Values with NA

Use replace() to change the negative values in the entry2 column to NA:

data_na <- df
data_na$entry2 <- replace(df$entry2, df$entry2 < 0, NA)
data_na

The data frame is now:

Output
count entry1 entry2 entry3 1 1 345 NA 345 2 2 65 654 867 3 3 23 345 3456 4 4 87 867 9 5 5 2345 34 867 6 6 876 98 76 7 7 35 NA 123 8 8 87 98 345 9 9 -765 67 765 10 10 4567 NA 234

The negative values in the entry2 column have been replaced with NA.

Conclusion

Replacing values in a data frame is a convenient option available in R for data analysis. Using replace() in R, you can switch NA, 0, and negative values when appropriate to clear up large datasets for analysis.

Continue your learning with How To Use sub() and gsub() in R.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about us


About the authors
Default avatar
Prajwal CN

author



Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Get our biweekly newsletter

Sign up for Infrastructure as a Newsletter.

Hollie's Hub for Good

Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

Become a contributor

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

Welcome to the developer cloud

DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

Learn more
DigitalOcean Cloud Control Panel