Pandas Concat Two DataFrames
Pandas is a powerful Python library for data manipulation and analysis. One of the common operations in data analysis is combining data from different sources, which can be efficiently done using the concat
function in pandas. This article will explore how to use the pandas.concat()
function to concatenate two DataFrames along a particular axis, either vertically or horizontally, and discuss various parameters that can be used to customize the concatenation process.
Introduction to pandas.concat()
The pandas.concat()
function is versatile and can be used to concatenate two or more pandas DataFrames or Series. This function provides a lot of flexibility: you can concatenate DataFrames row-wise (axis=0) or column-wise (axis=1), handle different index alignments, and manage other aspects like handling missing data during concatenation.
Basic Syntax of pandas.concat()
The basic syntax of the pandas.concat()
function is:
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
objs
: This is a list or dictionary of pandas DataFrames or Series to concatenate.axis
: The axis to concatenate along.0
means along rows (default), and1
means along columns.join
: How to handle indexes on other axes.outer
for union andinner
for intersection.ignore_index
: IfTrue
, do not use the index values along the concatenation axis.keys
: If multiple levels are involved, keys to form a hierarchical index.verify_integrity
: Check whether the new concatenated axis contains duplicates.
Examples of Concatenating Two DataFrames
Example 1: Basic Vertical Concatenation
Concatenating two DataFrames vertically means adding the rows of the second DataFrame to the first.
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
# Concatenate DataFrames
result = pd.concat([df1, df2])
print(result)
Output:
Example 2: Horizontal Concatenation with Different Indices
Concatenating horizontally will add the columns of the second DataFrame to the first. Handling different indices can be done using the join
parameter.
import pandas as pd
# Create two Dataframes with different indices
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])
# Concatenate DataFrames horizontally
result = pd.concat([df1, df2], axis=1, join='outer')
print(result)
Output:
Example 3: Ignoring the Index During Concatenation
Sometimes, the index itself might not be relevant, and you might want to ignore it during concatenation.
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
})
# Concatenate DataFrames ignoring the index
result = pd.concat([df1, df2], ignore_index=True)
print(result)
Output:
Example 4: Concatenation with Multi-level Index
Using keys to create a hierarchical index can be useful for identification after concatenation.
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
})
# Concatenate with keys
result = pd.concat([df1, df2], keys=['pandasdataframe.com_df1', 'pandasdataframe.com_df2'])
print(result)
Output:
Example 5: Handling Duplicate Indices
Handling duplicates in indices can be crucial to avoid data integrity issues.
import pandas as pd
# Create two DataFrames with duplicate indices
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[2, 3, 4, 5])
# Concatenate checking for integrity
try:
result = pd.concat([df1, df2], verify_integrity=True)
print(result)
except ValueError as e:
print("Error:", e)
Output:
Pandas Concat Two DataFrames Conclusion
Concatenating DataFrames is a fundamental operation in data manipulation and analysis using pandas. The pandas.concat()
function provides a robust way to combine DataFrames or Series either vertically or horizontally. By understanding and utilizing the parameters such as axis
, join
, ignore_index
, and keys
, you can handle various data alignment and integrity issues effectively. This flexibility makes pandas a powerful tool for preparing and analyzing data from diverse sources.