Pandas Concat Two DataFrames

Pandas Concat Two DataFrames

Pandas is a powerful Python library for data manipulation and analysis. One of the common operations in data analysis is combining data from different sources, which can be efficiently done using the concat function in pandas. This article will explore how to use the pandas.concat() function to concatenate two DataFrames along a particular axis, either vertically or horizontally, and discuss various parameters that can be used to customize the concatenation process.

Introduction to pandas.concat()

The pandas.concat() function is versatile and can be used to concatenate two or more pandas DataFrames or Series. This function provides a lot of flexibility: you can concatenate DataFrames row-wise (axis=0) or column-wise (axis=1), handle different index alignments, and manage other aspects like handling missing data during concatenation.

Basic Syntax of pandas.concat()

The basic syntax of the pandas.concat() function is:

pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
  • objs: This is a list or dictionary of pandas DataFrames or Series to concatenate.
  • axis: The axis to concatenate along. 0 means along rows (default), and 1 means along columns.
  • join: How to handle indexes on other axes. outer for union and inner for intersection.
  • ignore_index: If True, do not use the index values along the concatenation axis.
  • keys: If multiple levels are involved, keys to form a hierarchical index.
  • verify_integrity: Check whether the new concatenated axis contains duplicates.

Examples of Concatenating Two DataFrames

Example 1: Basic Vertical Concatenation

Concatenating two DataFrames vertically means adding the rows of the second DataFrame to the first.

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7'],
    'C': ['C4', 'C5', 'C6', 'C7'],
    'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])

# Concatenate DataFrames
result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Two DataFrames

Example 2: Horizontal Concatenation with Different Indices

Concatenating horizontally will add the columns of the second DataFrame to the first. Handling different indices can be done using the join parameter.

import pandas as pd

# Create two Dataframes with different indices
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])

# Concatenate DataFrames horizontally
result = pd.concat([df1, df2], axis=1, join='outer')
print(result)

Output:

Pandas Concat Two DataFrames

Example 3: Ignoring the Index During Concatenation

Sometimes, the index itself might not be relevant, and you might want to ignore it during concatenation.

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
})

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7']
})

# Concatenate DataFrames ignoring the index
result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

Pandas Concat Two DataFrames

Example 4: Concatenation with Multi-level Index

Using keys to create a hierarchical index can be useful for identification after concatenation.

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
})

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7']
})

# Concatenate with keys
result = pd.concat([df1, df2], keys=['pandasdataframe.com_df1', 'pandasdataframe.com_df2'])
print(result)

Output:

Pandas Concat Two DataFrames

Example 5: Handling Duplicate Indices

Handling duplicates in indices can be crucial to avoid data integrity issues.

import pandas as pd

# Create two DataFrames with duplicate indices
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7']
}, index=[2, 3, 4, 5])

# Concatenate checking for integrity
try:
    result = pd.concat([df1, df2], verify_integrity=True)
    print(result)
except ValueError as e:
    print("Error:", e)

Output:

Pandas Concat Two DataFrames

Pandas Concat Two DataFrames Conclusion

Concatenating DataFrames is a fundamental operation in data manipulation and analysis using pandas. The pandas.concat() function provides a robust way to combine DataFrames or Series either vertically or horizontally. By understanding and utilizing the parameters such as axis, join, ignore_index, and keys, you can handle various data alignment and integrity issues effectively. This flexibility makes pandas a powerful tool for preparing and analyzing data from diverse sources.