Merge Data Frames in R – 2 Easy Ways!

Filed Under: R Programming
MERGING DATA FRAMES IN R

Hello, readers! In this article, we would be focusing on 2 Easy ways of Merge Data Frames in R programming.

So, let us begin!


1. Merge data frames using merge() function

Let us now learn to merge data frames in R. R provides us with in-built merge() function to combine two data frames easily.

At first, we create two data frames using data.frame() function as discussed above. We need to remember that all the data frames on which we will perform merging, needs to have a common column name.

Syntax:

merge(data_frame1, data_frame2, by.x = "Column name"

In the below example, we have merged two data frames data_A and data_B having column ‘City’ as the common grouping factor. The data is then merged into the common column name in the argument list.

Example 1

rm(list = ls())

City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka")
Cost <- c(1200,1300,1400,1500,1600)

data_A <- data.frame(City,Cost,stringsAsFactors = FALSE)

City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka","Chennai","Pimple Saudagar")
Code <- c(1,3,5,1,2,4,55)

data_B <- data.frame(City,Code,stringsAsFactors = FALSE)
merge(data_A, data_B, by.x="City")

Output:

So, you see, only those row values of the column ‘City’ that belong to both the data frames are merged. It ignores all other values.

    City    Cost  Code
1 Karnataka 1600    2
2    Kerala 1400    5
3    Mumbai 1300    3
4      Pune 1200    1
5    Shirur 1500    1

Example 2

Now, what if both the data frames contain one or more common columns?

Don’t worry, we do have a solution for the same.

In the below example, the columns ‘City’ and ‘Cost’ are common to both the data frames. We would definitely merge the data frames according to the column ‘City’ but will provide it as an argument : by.x = 'City' and by.y = 'City' as shown below.

rm(list = ls())

City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka")
Cost <- c(1200,1300,1400,1500,1600)

data_A <- data.frame(City,Cost,stringsAsFactors = FALSE)

City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka","Chennai","Pimple Saudagar")
Code <- c(1,3,5,1,2,4,55)
Cost <- c(12,1,3,14,55,16,3)

data_B <- data.frame(City,Cost,Code,stringsAsFactors = FALSE)
merge(data_A, data_B, by.x="City",by.y="City")

By this, we merge both the data frames and have Cost columns printed for both the data frames as shown below.

Output:

       City Cost.x  Cost.y   Code
1 Karnataka   1600     55    2
2    Kerala   1400      3    5
3    Mumbai   1300      1    3
4      Pune   1200     12    1
5    Shirur   1500     14    1

2. Merging data frames using R dplyr library

Now is the time for us to implement the concept of merging data frames using ‘dplyr‘ library from R documentation.

The dplyr library provides us with two important functions:

  1. left_join()
  2. right_join()

These functions follow the concept of SQL Joins to merge the data frames.

That is, with left join, it combines all the values from the left data frame and only those values from the right data frame that have values in common to the left data frame.

On similar lines, the right join would merge all the data values from the right data frame with only those values that are in common from the left data frame.

In the below example, left_join() function is used to merge the two data frames.

Syntax:

left_join(data1,data2, by = c("Common-column" = "Common-column")

Example 1

rm(list = ls())

City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka")
Cost <- c(1200,1300,1400,1500,1600)

data_A <- data.frame(City,Cost,stringsAsFactors = FALSE)

City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka","Chennai","Pimple Saudagar")
Code <- c(1,3,5,1,2,4,55)
Cost <- c(12,1,3,14,55,16,3)

data_B <- data.frame(City,Cost,Code,stringsAsFactors = FALSE)
library(dplyr)
left_join(data_A, data_B, by = c("City" = "City"))

Output:

As seen below, the data is merged along with the common column ‘City’. If you observe, you can see that all the values from the right data frame that are not present in the left data frame are omitted.

      City Cost.x Cost.y Code
1      Pune   1200     12    1
2    Mumbai   1300      1    3
3    Kerala   1400      3    5
4    Shirur   1500     14    1
5 Karnataka   1600     55    2

On the similar lines, in the below example, we have implemented the concept of right_join() function. As a result, all the data values from the right column along with the columns from the left data frame that have values in common are printed. All other values are replaced by NULL/NA values.

Example 2

rm(list = ls())

City <- c("Pune","Mumbai","Kerala","Shirur","Karnataka")
Cost <- c(1200,1300,1400,1500,1600)

data_A <- data.frame(City,Cost,stringsAsFactors = FALSE)

City <-c("Pune","Mumbai","Kerala","Shirur","Karnataka","Chennai","Pimple Saudagar")
Code <- c(1,3,5,1,2,4,55)
Cost <- c(12,1,3,14,55,16,3)

data_B <- data.frame(City,Cost,Code,stringsAsFactors = FALSE)
library(dplyr)
right_join(data_A, data_B, by = c("City" = "City"))

Output:

             City Cost.x Cost.y Code
1            Pune   1200     12    1
2          Mumbai   1300      1    3
3          Kerala   1400      3    5
4          Shirur   1500     14    1
5       Karnataka   1600     55    2
6         Chennai     NA     16    4
7 Pimple Saudagar     NA      3   55

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

Try implementing the concept of Merging data frames and do let us know about your understanding in the comment box.

Till then, Happy Learning!! 馃檪

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages