Strings and Dates in R for Data Preprocessing

Filed Under: R Programming
Strings And Dates In R

It is not a secret that any data-driven work requires more than half of the overall project time to process the data. Most of the time, the data will be messy and unstructured.

Data preprocessing is the most time-consuming as well as an important box to tick before you dive into model building. I am sure that most of you have to deal with dates in the data and it’s not that easy.

However, R offers a couple of valuable functions for working with strings and dates in R. Let’s see how you can use them in your data processing works. 

The whole structure of this article is connected and I suggest you follow this from the beginning for more connectivity and understanding as well.


1. Getting started with dates in R

Let’s start the proceedings with a simple task. It will be good to start by getting your system date/current date. Then why wait?

#Prints the current date
Sys.Date()
"2021-06-22"

You can use the sys.Date() function in R to get the system date or the current date as shown above. Please observe that this function should return the data object, but it is covered by double quotes. Don’t worry, R converts the date object into a string for more or like a printing purpose. You can confirm that with the below code. 

#Display the class 
class(Sys.Date())
"Date"

2. String to Date conversion in R

In the previous section, we discussed the class of sys.Date() function. You can also call it a string representation of the date. So, to convert it to a date, we can use as.date function in R. But, make sure you are following the format of the input string. 

Because the as.date function by default assumes the format of YYYY-MM-DD. However, if you want to convert to a different format, then you should specify the format parameter. 

#Converts string to date
as.Date("2021-06-22")
"2021-06-22"

Let’s try a different format.

#Change in format - mm-dd-yyyy 
as.Date("06/22/2021")
Error in charToDate(x) : 
  character string is not in a standard unambiguous format

Oh! something went wrong it seems. Notice that we have changed the format and R is throwing an error. So, in these cases, you have to specify the format as mm-dd-yyyy.

#Specifying the format 
as.Date("06/22/2021", format = "%m/%d/%y")
"2020-06-22"

Now, R understand the format and display the date.


3. Year, Month and Day in a Date

If you have your dates represented by different variables such as year, month and day, then you can merge them into a date. As simple as that.

#Combine different variables having date representations into a date
year <- 2021
month <- 06
day <- 22
 
ISOdate(year, month, day)
"2021-06-22 12:00:00 GMT"

You will be using the in-built ISODate function in R for this purpose. You can observe that this function also returns the time aspect. If you don’t want the time aspect and want to retrieve only pure date, then run the below code. 

#Retrives pure date
as.Date(ISOdate(year,month,day))

#For invalid dates
ISOdate(2022,2,30)
"2021-06-22"

NA

You can observe that as.date function will retrieve pure date by negating time. In the other line of code, for any invalid dates that you input, it will result in NA (in our case FEB month will never have 30 days).

You can also input multiple date components in the multiple variables and can merge them as shown above.

#Multiple dates
my_years <- c(2021,2022,2023,2024)
my_months <- c(06,07,08,09)
my_days <- c(22,23,24,25)

ISOdate(my_years, my_months,my_days)
"2021-06-22 12:00:00 GMT" 
"2022-07-23 12:00:00 GMT" 
"2023-08-24 12:00:00 GMT"
"2024-09-25 12:00:00 GMT"
#Use of as.date function for pure dates
as.Date(ISOdate(my_years, my_months, my_days))
"2021-06-22"
"2022-07-23" 
"2023-08-24" 
"2024-09-25"

4. Date sequences

You can create a sequence of dates in R. You can make the sequence of individual aspects such as day, month, and year. If you are aware of the seq() function, which is used to generate a sequence, it works the same way with dates as well. Let’s see how it works. 

#Generate a sequence of dates
x <- as.Date("2021-06-22")
y <- as.Date("2021-07-22")

seq(from = x, to = y, by = 1)
"2021-06-22" "2021-06-23" "2021-06-24" "2021-06-25" "2021-06-26" "2021-06-27"
"2021-06-28" "2021-06-29" "2021-06-30" "2021-07-01" "2021-07-02" "2021-07-03"
"2021-07-04" "2021-07-05" "2021-07-06" "2021-07-07" "2021-07-08" "2021-07-09"
"2021-07-10" "2021-07-11" "2021-07-12" "2021-07-13" "2021-07-14" "2021-07-15"
"2021-07-16" "2021-07-17" "2021-07-18" "2021-07-19" "2021-07-20" "2021-07-21"
"2021-07-22"

You can observe that the seq() function will generate the sequence for a whole month. You can play with the ‘by’ parameter by changing the numbers. Let’s try this with ‘3’. 

#Sequence by 3
seq(from = x, to = y, by = 3)
"2021-06-22" "2021-06-25" "2021-06-28" "2021-07-01" "2021-07-04" "2021-07-07"
"2021-07-10" "2021-07-13" "2021-07-16" "2021-07-19" "2021-07-22"

Good job! You did fantastic till here and worked on many possible aspects of strings and dates in R. Now, tell me working with strings and dates in R is still a headache for you?


Ending note – Strings and Dates in R

Data preprocessing is always a time-consuming task and reserves much importance as well. However, if you are working with time-series data, you should be more specific about dates and it’s operations. In this article, I tried to explain some of the key concepts, which will help you in your data processing works. I hope, by now you got a good hold on strings and dates in R.

That’s all for now, Happy R!!!

More read: R documentation

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content