Pandas merge() – Merging Two DataFrame Objects

Filed Under: Python
Pandas Dataframe Merge

Pandas DataFrame merge() function is used to merge two DataFrame objects with a database-style join operation. The joining is performed on columns or indexes.

If the joining is done on columns, indexes are ignored. This function returns a new DataFrame and the source DataFrame objects are unchanged.

Pandas DataFrame merge() Function Syntax

The merge() function syntax is:


def merge(
    self,
    right,
    how="inner",
    on=None,
    left_on=None,
    right_on=None,
    left_index=False,
    right_index=False,
    sort=False,
    suffixes=("_x", "_y"),
    copy=True,
    indicator=False,
    validate=None,
)
  • right: The other DataFrame to merge with the source DataFrame.
  • how: {‘left’, ‘right’, ‘outer’, ‘inner’}, default ‘inner’. This is the most important parameter to define the merge operation type. These are similar to SQL left outer join, right outer join, full outer join, and inner join.
  • on: Column or index level names to join on. These columns must be present in both the DataFrames. If not provided, the intersection of the columns in both DataFrames are used.
  • left_on: Column or index level names to join on in the left DataFrame.
  • right_on: Column or index level names to join on in the right DataFrame.
  • left_index: Use the index from the left DataFrame as the join key(s).
  • right_index: Use the index from the right DataFrame as the join key.
  • sort: Sort the join keys lexicographically in the result DataFrame.
  • suffixes: Suffix to apply to overlapping column names in the left and right side, respectively.
  • indicator: If True, adds a column to output DataFrame called “_merge” with information on the source of each row.
  • validate: used to validate the merge process. The valid values are {“one_to_one” or “1:1”, “one_to_many” or “1:m”, “many_to_one” or “m:1”, “many_to_many” or “m:m”}.

Pandas DataFrame merge() Examples

Let’s look at some examples of merging two DataFrame objects.

1. Default Merging – inner join


import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}

df1 = pd.DataFrame(d1)

print('DataFrame 1:\n', df1)

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})
print('DataFrame 2:\n', df2)

df_merged = df1.merge(df2)
print('Result:\n', df_merged)

Output:


DataFrame 1:
      Name Country Role
0  Pankaj   India  CEO
1  Meghna   India  CTO
2    Lisa     USA  CTO
DataFrame 2:
    ID    Name
0   1  Pankaj
1   2  Anupam
2   3    Amit
Result:
      Name Country Role  ID
0  Pankaj   India  CEO   1

2. Merging DataFrames with Left, Right, and Outer Join


print('Result Left Join:\n', df1.merge(df2, how='left'))
print('Result Right Join:\n', df1.merge(df2, how='right'))
print('Result Outer Join:\n', df1.merge(df2, how='outer'))

Output:


Result Left Join:
      Name Country Role   ID
0  Pankaj   India  CEO  1.0
1  Meghna   India  CTO  NaN
2    Lisa     USA  CTO  NaN
Result Right Join:
      Name Country Role  ID
0  Pankaj   India  CEO   1
1  Anupam     NaN  NaN   2
2    Amit     NaN  NaN   3
Result Outer Join:
      Name Country Role   ID
0  Pankaj   India  CEO  1.0
1  Meghna   India  CTO  NaN
2    Lisa     USA  CTO  NaN
3  Anupam     NaN  NaN  2.0
4    Amit     NaN  NaN  3.0

3. Merging DataFrame on Specific Columns


import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
      'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})

print(df1.merge(df2, on='ID'))
print(df1.merge(df2, on='Name'))

Output:


   Name_x  ID Country Role  Name_y
0  Pankaj   1   India  CEO  Pankaj
1  Meghna   2   India  CTO  Anupam
2    Lisa   3     USA  CTO    Amit

     Name  ID_x Country Role  ID_y
0  Pankaj     1   India  CEO     1

4. Specify Left and Right Columns for Merging DataFrame Objects


import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'ID1': [1, 2, 3], 'Country': ['India', 'India', 'USA'],
      'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)

df2 = pd.DataFrame({'ID2': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})

print(df1.merge(df2))

print(df1.merge(df2, left_on='ID1', right_on='ID2'))

Output;


     Name  ID1 Country Role  ID2
0  Pankaj    1   India  CEO    1

   Name_x  ID1 Country Role  ID2  Name_y
0  Pankaj    1   India  CEO    1  Pankaj
1  Meghna    2   India  CTO    2  Anupam
2    Lisa    3     USA  CTO    3    Amit

5. Using Index as the Join Keys for Merging DataFrames


import pandas as pd

d1 = {'Name': ['Pankaj', 'Meghna', 'Lisa'], 'Country': ['India', 'India', 'USA'], 'Role': ['CEO', 'CTO', 'CTO']}
df1 = pd.DataFrame(d1)

df2 = pd.DataFrame({'ID': [1, 2, 3], 'Name': ['Pankaj', 'Anupam', 'Amit']})

df_merged = df1.merge(df2)
print('Result Default Merge:\n', df_merged)

df_merged = df1.merge(df2, left_index=True, right_index=True)
print('\nResult Index Merge:\n', df_merged)

Output:


Result Default Merge:
      Name Country Role  ID
0  Pankaj   India  CEO   1

Result Index Merge:
    Name_x Country Role  ID  Name_y
0  Pankaj   India  CEO   1  Pankaj
1  Meghna   India  CTO   2  Anupam
2    Lisa     USA  CTO   3    Amit

References

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages