The sub() and gsub() function in R

Filed Under: R Programming
The Sub() And Gsub() Functions In R

You can replace the string or the characters in a vector or a data frame using the sub() and gsub() function in R.

Hello folks, we are going to focus on the most useful and beneficial functions in R, i.e. sub() and gsub() functions.

The sub() and gsub() functions in R, will replace the string with a specific string. You can even use regular expressions with the gsub() function. Col right?

Let’s move forward and explore these functions using relevant illustrations.


Syntax of sub() and gsub()

sub() and gsub(): The functions which are exclusively useful for string substitution operations in R. You can replace the string in a vector or a data frame and can substitute the specified string.

sub(pattern, replacement, x)
gsub(pattern, replacement, x)

Where,

  • Pattern = The pattern or the string which you want to be replaced.
  • Replacement = A input string to replace the pattern string.
  • X = A vector or a data frame to replace the strings.

The sub() function in R

The sub() function in R is used to replace the string in a vector or a data frame with the input or the specified string.

When you are dealing with large data sets, it’s impossible to look at each line to find and replace the target words or strings.

In this case, the sub() function will replace string.

But, the disadvantage of the sub() function is that the function replaces only the first occurrence by leaving all other similarities.

Complicated? Don’t worry. Let’s illustrate this using a simple example.


1. A simple implementation of sub() function

In this example, we are going to replace the string with our input string in a vector. Let’s see how it goes.

#a input vector 
df<-"R is a collaborative project with many contributors"

#replaces the string 
sub('R','R language',df)
Output = "R language is a collaborative project with many contributors"

In the above example, you can see that the sub() function replaces the string ‘R’ in the vector with the ‘R language’ string which is specified in the code as a replacement.

Let’s go for another sample to understand it eve better.

#a vector
df<- "The Earth surface is 71% water covered. Earth has 29 % of land"

#using sub function to substitute 
sub('Planetary','Earth',df)
Output = "The Planetary surface is 71% water covered. Earth has 29 % of land"

In this example, you can observe that the sub() function replaced the first occurrence of the string ‘Earth’ with ‘Planetary’. But in it’s next occurrence the string remains same.

Well, as discussed above, the sub() function will not replace all the strings, instead it merely replaces the first occurrence of the string.

I hope, by now it is clear to you.


2. sub() function with a data frame

When you think of using the sub() function with data frames, you will get the same output as above.

The sub() function will change only first ever occurrence by leaving other as it is.

Let’s see how it works!

For this we have to create a data frame first. Then we can use sub() function to get the results.

#creating a data frame
df<-data.frame(Column_1 = c('Florida','Germany','Georgia','Geniva','Istanbul','NewZealand','Australia'), Column_2=c(1,2,3,4,5,6,7))

#data frame
df
      Column_1     Column_2
1      Florida        1
2      Germany        2
3      Georgia        3
4       Geniva        4
5     Istanbul        5
6   NewZealand        6
7    Australia        7
#replacing the G character with A
sub('G','A',df)
"c(\"Florida\", \"Aermany\", \"Georgia\", \"Geniva\", \"Istanbul\", \"NewZealand\", \"Australia\")"

"c(1, 2, 3, 4, 5, 6, 7)"  

You can see that the function will change the first occurrence by leaving the others. Note that, we have selected an entire dataset here.

But you can select the particular column to get all the words with ‘G’ replaced by ‘A’ as shown below.

#substituting the values
sub('G','A',df$Column_1)
"Florida"   
"Aermany"  
"Aeorgia"   
"Aeniva"  
"Istanbul" 
"NewZealand"
"Australia" 

Like this, you can easily substitute the values to a data frame.

In the next section, we are going to see how gsub() function can be used in R.


The gsub() function in R

The gsub() function in R is used for replacement operations. The functions takes the input and substitutes it against the specified values.

The gsub() function always deals with regular expressions. You can use the regular expressions as the parameter of substitution.

The regular expression is just a series of characters that represent a search pattern in the data.

In the below sections, you can witness the applications and usage of gsub() function in R.


1. A simple implementation of gsub() function

The gsub() function in R is used to replace the strings with input strings or values. Note that, you can also use the regular expression with gsub() function to deal with numbers.

# A vector 
df<-("I love R. The R is a statistical analysis language")

This is data that has ‘R’ written multiple times. Now, we are going to replace the R with ‘R programming’ in both sentences using gsub() function.

#substituting the values using gsub()
gsub('R','R programming',df)
"I love R programming. The R programming is a statistical analysis language"

Fantastic!

See how quickly the word ‘R’ in both sentences gets replaced by the ‘R programming’ word.

The gsub() function finds every word matching the parameter and replaces that with our input word or values.


2. gsub() function with regular expression

As the heading suggests you can use the regular expression with gsub() function without any hassle.

You can negate the numbers from the data using the regular expressions.

Regular expressions(regex): Also called as rational expressions, they are a sequence of values or characters which usually defines a pattern of search. Most commonly used by the searching algorithms and developed in the language theory in the computer science domain.

Let’s see how it works.

#vector having numeric values
df<-"I was born on June 5,1998"
#eliminating the numeric values
gsub('[0-9]*','',df)
"I was born on June ,"

So, basically the gsub() function searches for the numbers in the data and substitute them with a no space or you can all it as eliminating the numbers.


2. The gsub() function with data frames

Like the sub() function, the gsub() is used to substitute the values with the input values. One of the interesting application is shown below which explains the relevance and importance of gsub() function in R.

Let’s roll!!!

#creating a dataframe
df<-data.frame(Speaker=c('Abraham','Wassimo','Fredrick','Richard','Ravish','Rubina','Laura'),Age=c(45,47,39,33,36,28,30))
#data frame
df
    Speaker    Age
1   Abraham    45
2   Wassimo    47
3  Fredrick    39
4   Richard    33
5    Ravish    36
6    Rubina    28
7     Laura    30

well, now we have a list of speakers and their age as a input data.

Now, we are going to use the regular expression with gsub() to substitute the initial space with ‘Mr/Mrs.’ expression. Let’s do it together.

gsub('.*^','Mr/Mrs.',df$Speaker)
"Mr/Mrs.Abraham"  
"Mr/Mrs.Wassimo" 
"Mr/Mrs.Fredrick" 
"Mr/Mrs.Richard" 
"Mr/Mrs.Ravish"  
"Mr/Mrs.Rubina" 
"Mr/Mrs.Laura"   

Awesome. You did it.

See, how easily you have added the expressions behind the speaker names. cool right?


Wrapping Up

The sub() and gsub() function in R is used for substitution as well as replacement operations.

The sub() function will replace the first occurrence leaving the other as it is. On the other hand, the gsub() function will replace all the strings or values with the input strings.

Although you cannot find lot of differences between them, you can use them accordingly.

I hope you got the better of sub() and gsub() function in R. That’s all for now. Happy substituting!!!

More read: R documentation

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages