Pandas DataFrame append() function

Filed Under: Python
Pandas Dataframe Append Function

Pandas DataFrame append() function is used to merge rows from another DataFrame object. This function returns a new DataFrame object and doesn’t change the source objects. If there is a mismatch in the columns, the new columns are added in the result DataFrame.

1. Pandas DataFrame append() Parameters

The append() function syntax is:


append(other, ignore_index=False, verify_integrity=False, sort=None)
  • other: The DataFrame, Series or Dict-like object whose rows will be added to the caller DataFrame.
  • ignore_index: if True, the indexes from the source DataFrame objects are ignored.
  • verify_integrity: if True, raise ValueError on creating index with duplicates.
  • sort: sort columns if the source DataFrame columns are not aligned. This functionality is deprecated. So we have to pass sort=True to sort and silence the warning message. If sort=False is passed, the columns are not sorted and warning is ignored.

Let’s look into some examples of the DataFrame append() function.

2. Appending Two DataFrames


import pandas as pd

df1 = pd.DataFrame({'Name': ['Pankaj', 'Lisa'], 'ID': [1, 2]})
df2 = pd.DataFrame({'Name': ['David'], 'ID': [3]})

print(df1)
print(df2)

df3 = df1.append(df2)
print('\nResult DataFrame:\n', df3)

Output:


     Name  ID
0  Pankaj   1
1    Lisa   2
    Name  ID
0  David   3

Result DataFrame:
      Name  ID
0  Pankaj   1
1    Lisa   2
0   David   3

3. Appending and Ignoring DataFrame Indexes

If you look at the previous example, the output contains duplicate indexes. We can pass ignore_index=True to ignore the source indexes and assign new index to the output DataFrame.


df3 = df1.append(df2, ignore_index=True)
print(df3)

Output:


     Name  ID
0  Pankaj   1
1    Lisa   2
2   David   3

4. Raise ValueError for duplicate indexes

We can pass verify_integrity=True to raise ValueError if there are duplicate indexes in the two DataFrame objects.


import pandas as pd

df1 = pd.DataFrame({'Name': ['Pankaj', 'Lisa'], 'ID': [1, 2]})
df2 = pd.DataFrame({'Name': ['David'], 'ID': [3]})

df3 = df1.append(df2, verify_integrity=True)

Output:


ValueError: Indexes have overlapping values: Int64Index([0], dtype='int64')

Let’s look at another example where we don’t have duplicate indexes.


import pandas as pd

df1 = pd.DataFrame({'Name': ['Pankaj', 'Lisa'], 'ID': [1, 2]}, index=[100, 200])

df2 = pd.DataFrame({'Name': ['David'], 'ID': [3]}, index=[300])

df3 = df1.append(df2, verify_integrity=True)

print(df3)

Output:


       Name  ID
100  Pankaj   1
200    Lisa   2
300   David   3

5. Appending DataFrame objects with Non-Matching Columns


import pandas as pd

df1 = pd.DataFrame({'Name': ['Pankaj', 'Lisa'], 'ID': [1, 2]})
df2 = pd.DataFrame({'Name': ['Pankaj', 'David'], 'ID': [1, 3], 'Role': ['CEO', 'Author']})

df3 = df1.append(df2, sort=False)

print(df3)

Output:


     Name  ID    Role
0  Pankaj   1     NaN
1    Lisa   2     NaN
0  Pankaj   1     CEO
1   David   3  Author

We are explicitly passing sort=False to avoid sorting of columns and ignore FutureWarning. If you don’t pass this parameter, the output will contain the following warning message.


FutureWarning: Sorting because the non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

Let’s see what happens when we pass sort=True.


import pandas as pd

df1 = pd.DataFrame({'Name': ['Pankaj', 'Lisa'], 'ID': [1, 2]})
df2 = pd.DataFrame({'Name': ['Pankaj', 'David'], 'ID': [1, 3], 'Role': ['CEO', 'Author']})

df3 = df1.append(df2, sort=True)

print(df3)

Output:


   ID    Name    Role
0   1  Pankaj     NaN
1   2    Lisa     NaN
0   1  Pankaj     CEO
1   3   David  Author

Notice that the columns are sorted in the result DataFrame object. Note that this feature is deprecated and will be removed from future releases.

Let’s look at another example where we have non-matching columns with int values.


import pandas as pd

df1 = pd.DataFrame({'ID': [1, 2]})
df2 = pd.DataFrame({'Name': ['Pankaj', 'Lisa']})

df3 = df1.append(df2, sort=False)
print(df3)

Output:


    ID    Name
0  1.0     NaN
1  2.0     NaN
0  NaN  Pankaj
1  NaN    Lisa

Notice that the ID values are changed to floating-point numbers to allow NaN value.

6. References

Leave a Reply

Your email address will not be published. Required fields are marked *

close
Generic selectors
Exact matches only
Search in title
Search in content
Search in posts
Search in pages