na.rm and is.na in R programming | Find Missing information

Filed Under: R Programming
Na Rm And Is Na In R

If your daily tasks involve working with data, then the words like “messy” and “tidy” might not new to you. Most of the data-driven works required a reasonable time frame to transform the data for future works.

This missing information in the data is termed as “NA” and you can see this very often. The word NA means, NOT AVAILABLE. This happens when you don’t know the particular value. The value may be lost, damaged, or many more.

But, you need not worry about it. R offers multiple ways to deal with them. In this article, we will be talking about na.rm and is.na in R. Let’s roll!!!


What are NA values in R?

NA can be used as a placeholder for the missing information in the data in R. It comes under a special symbol. R will immediately know that it is missing information. 

To keep things clear, you will expect a result for any task. Now, what can you expect by adding 1 to the missing information?

#example
1 + NA 
 NA

Did you expect something else than NA?

R returns NA because we never have the value for NA and it can be anything. You cannot say 1 + NA = 1 because we don’t know what the value of NA is.

Let me try this another way!

Let’s assume NA == 1. So we are equating 1 to NA and the value of NA will be NA. Now, recall the previous drill.

#Example 2
#Equates 1 to NA
NA==1

#Adds 1 with NA
1+NA
NA

NA

Though we have equated 1 to NA, we still don’t know if NA is 1 or any other value. So, this question will always persist unless you take necessary action against NA’s.

These sample drills are just an instance. But, there are good chances that you can commit errors based on these NA’s. So it is very important to deal with them as early as possible.


Using na.rm in R

These NA’s can help you to understand your data and enable you to make replacements that make sense. Having said that, NA’s can irritate you also.

Assume you have data that includes 100 data records and 1 NA value. The irony is this 1 NA value is enough to cause some issues. If you want to compute the mean, median, or mode of this data, it will result in NA.

#Creates values from 1 to 100 with an NA
Dummy <- c(NA, 1:100)
Dummy
NA   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18
19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37
38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56
57  58  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75
76  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94
95  96  97  98  99 100

We have 100 values and among them, one value is NA (Not Available). Let’s try to compute the mean for this data.

#Computes mean for this data
mean(Dummy)
 NA

Oh! The mean is NA. Just observe how a single NA value can corrupt your result. It is necessary to work on NA’s to avoid these errors.

For this purpose, R offers the na.rm argument with many functions. na.rm means “remove NA if any”. Let’s try this argument with the previous sample.

#mean with na.rm argument
mean(Dummy, na.rm = TRUE)
50.5

Wow! You got it. By specifying na.rm as TRUE, R will ignore the NA values and performs the input operation, and returns the result. Isn’t it cool?


Using is.na in R

What will you do if you were lost in a big dataset and want to find NA’s? You can say, a logical test can help you. Hmm, fine, I will go with you.

#Logical test
NA == NA
#Logical test with a vector 
dummy <- c(1,2,3,NA) == NA
dummy
NA

NA NA NA NA

I don’t think it is working.

But don’t worry about this. R offers a function is.na() to identify NA’s in the data logically. Let’s try this.

#Identifies NA's logically
is.na(c(1,2,3,NA))
FALSE FALSE FALSE  TRUE

Amazing! You did it. It’s much satisfying to see that R can identify the NA’s logically.

We can now try this on a real-world dataset.

#is.na with Airquality dataset 
df <- datasets::airquality
df

is.na(df)
FALSE   FALSE FALSE FALSE FALSE FALSE
FALSE   FALSE FALSE FALSE FALSE FALSE
FALSE   FALSE FALSE FALSE FALSE FALSE
TRUE    TRUE FALSE FALSE FALSE FALSE
FALSE   TRUE FALSE FALSE FALSE FALSE
FALSE   FALSE FALSE FALSE FALSE FALSE
FALSE   FALSE FALSE FALSE FALSE FALSE
FALSE   FALSE FALSE FALSE FALSE FALSE
TRUE    FALSE FALSE FALSE FALSE FALSE

You can see that is.na function can identify the NA’s logically. You can even try for an individual column rather than for a whole dataset. Below codes will help you in this.

#Execute these codes individually 

is.na(df$Ozone)
is.na(df$Solar.R)
is.na(df$Wind)

This is all about is.na() function and its use in data preprocessing. Both na.rm and is.na in R can help you in many ways as shown. 


Ending note

na.rm and is.na in R are the two most useful things when it comes to dealing with NA’s in R. It is always advised to perform proper missing value treatments to avoid potential loss of information. Don’t stop here, try will different datasets and also search for any other methods to deal with NA’s in your data. That’s all for now. Happy R!!!

More read: R documentation

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content