Hello, readers! In this article, we would be focusing on 2 Easy ways of Merge Data Frames in R programming.
So, let us begin!
Table of Contents
1. Merge data frames using merge() function
Let us now learn to merge data frames in R. R provides us with in-built merge() function
to combine two data frames easily.
At first, we create two data frames using data.frame() function
as discussed above. We need to remember that all the data frames on which we will perform merging, needs to have a common column name.
Syntax:
merge(data_frame1, data_frame2, by.x = "Column name"
In the below example, we have merged two data frames data_A and data_B having column ‘City’ as the common grouping factor. The data is then merged into the common column name in the argument list.
Example 1
rm(list = ls())
City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka")
Cost <- c(1200,1300,1400,1500,1600)
data_A <- data.frame(City,Cost,stringsAsFactors = FALSE)
City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka","Chennai","Pimple Saudagar")
Code <- c(1,3,5,1,2,4,55)
data_B <- data.frame(City,Code,stringsAsFactors = FALSE)
merge(data_A, data_B, by.x="City")
Output:
So, you see, only those row values of the column ‘City’ that belong to both the data frames are merged. It ignores all other values.
City Cost Code
1 Karnataka 1600 2
2 Kerala 1400 5
3 Mumbai 1300 3
4 Pune 1200 1
5 Shirur 1500 1
Example 2
Now, what if both the data frames contain one or more common columns?
Don’t worry, we do have a solution for the same.
In the below example, the columns ‘City’ and ‘Cost’ are common to both the data frames. We would definitely merge the data frames according to the column ‘City’ but will provide it as an argument : by.x = 'City'
and by.y = 'City'
as shown below.
rm(list = ls())
City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka")
Cost <- c(1200,1300,1400,1500,1600)
data_A <- data.frame(City,Cost,stringsAsFactors = FALSE)
City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka","Chennai","Pimple Saudagar")
Code <- c(1,3,5,1,2,4,55)
Cost <- c(12,1,3,14,55,16,3)
data_B <- data.frame(City,Cost,Code,stringsAsFactors = FALSE)
merge(data_A, data_B, by.x="City",by.y="City")
By this, we merge both the data frames and have Cost columns printed for both the data frames as shown below.
Output:
City Cost.x Cost.y Code
1 Karnataka 1600 55 2
2 Kerala 1400 3 5
3 Mumbai 1300 1 3
4 Pune 1200 12 1
5 Shirur 1500 14 1
2. Merging data frames using R dplyr library
Now is the time for us to implement the concept of merging data frames using ‘dplyr‘ library from R documentation.
The dplyr library provides us with two important functions:
left_join()
right_join()
These functions follow the concept of SQL Joins to merge the data frames.
That is, with left join, it combines all the values from the left data frame and only those values from the right data frame that have values in common to the left data frame.
On similar lines, the right join would merge all the data values from the right data frame with only those values that are in common from the left data frame.
In the below example, left_join() function
is used to merge the two data frames.
Syntax:
left_join(data1,data2, by = c("Common-column" = "Common-column")
Example 1
rm(list = ls())
City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka")
Cost <- c(1200,1300,1400,1500,1600)
data_A <- data.frame(City,Cost,stringsAsFactors = FALSE)
City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka","Chennai","Pimple Saudagar")
Code <- c(1,3,5,1,2,4,55)
Cost <- c(12,1,3,14,55,16,3)
data_B <- data.frame(City,Cost,Code,stringsAsFactors = FALSE)
library(dplyr)
left_join(data_A, data_B, by = c("City" = "City"))
Output:
As seen below, the data is merged along with the common column ‘City’. If you observe, you can see that all the values from the right data frame that are not present in the left data frame are omitted.
City Cost.x Cost.y Code
1 Pune 1200 12 1
2 Mumbai 1300 1 3
3 Kerala 1400 3 5
4 Shirur 1500 14 1
5 Karnataka 1600 55 2
On the similar lines, in the below example, we have implemented the concept of right_join()
function. As a result, all the data values from the right column along with the columns from the left data frame that have values in common are printed. All other values are replaced by NULL/NA values.
Example 2
rm(list = ls())
City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka")
Cost <- c(1200,1300,1400,1500,1600)
data_A <- data.frame(City,Cost,stringsAsFactors = FALSE)
City <-c("Pune","Mumbai","Kerala","Shirur","Karnataka","Chennai","Pimple Saudagar")
Code <- c(1,3,5,1,2,4,55)
Cost <- c(12,1,3,14,55,16,3)
data_B <- data.frame(City,Cost,Code,stringsAsFactors = FALSE)
library(dplyr)
right_join(data_A, data_B, by = c("City" = "City"))
Output:
City Cost.x Cost.y Code
1 Pune 1200 12 1
2 Mumbai 1300 1 3
3 Kerala 1400 3 5
4 Shirur 1500 14 1
5 Karnataka 1600 55 2
6 Chennai NA 16 4
7 Pimple Saudagar NA 3 55
Conclusion
By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.
Try implementing the concept of Merging data frames and do let us know about your understanding in the comment box.
Till then, Happy Learning!! 🙂