Pandas Concat vs Merge
In data analysis, combining data from different sources into a single coherent data structure is a common task. Pandas, a powerful data manipulation library in Python, provides several functions to perform these operations, notably concat()
and merge()
. This article will explore the differences between these two functions, their use cases, and how to apply them effectively in various scenarios.
Understanding Pandas concat()
The concat()
function in Pandas is used to concatenate pandas objects along a particular axis with optional set logic along the other axes. It can be used to concatenate Series, DataFrame, or Panel objects.
Example 1: Basic Concatenation of Two DataFrames
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2])
print(result)
Output:
Example 2: Concatenation with Axis Specified
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 3: Concatenation with Index Reorganization
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2], ignore_index=True)
print(result)
Output:
Understanding Pandas merge()
The merge()
function is used to merge DataFrame or named Series objects with a database-style join. The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored.
Example 4: Basic Merge Operation
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
left = pd.DataFrame({
'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
right = pd.DataFrame({
'key': ['K0', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
})
result = pd.merge(left, right, on='key')
print(result)
Output:
Example 5: Merge with Different Key Names
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
left = pd.DataFrame({
'left_key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3']
})
right = pd.DataFrame({
'right_key': ['K0', 'K1', 'K2', 'K3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
result = pd.merge(left, right, left_on='left_key', right_on='right_key')
print(result)
Output:
Comparing concat()
and merge()
While both concat()
and merge()
are used for combining data, they serve different purposes and are used in different scenarios.
Example 6: Concatenation vs. Merge
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
left = pd.DataFrame({
'left_key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3']
})
right = pd.DataFrame({
'right_key': ['K0', 'K1', 'K2', 'K3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
# Using concat
concat_result = pd.concat([df1, df2])
# Using merge
merge_result = pd.merge(df1, df2, on='A', how='inner')
print(concat_result)
print(merge_result)
Output:
Advanced Usage of concat()
Example 7: Concatenation with MultiIndex
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'key': ['K0', 'K1', 'K2', 'K3']
}).set_index(['key'])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3'],
'key': ['K0', 'K1', 'K2', 'K3']
}).set_index(['key'])
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 8: Handling Overlapping Columns
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C0', 'C1', 'C2', 'C3']
})
result = pd.concat([df1, df2], sort=False)
print(result)
Output:
Advanced Usage of merge()
Example 9: Merge with Multiple Keys
import pandas as pd
left = pd.DataFrame({
'key1': ['K0', 'K1', 'K2', 'K3'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3']
})
right = pd.DataFrame({
'key1': ['K0', 'K1', 'K2', 'K3'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'B': ['B0', 'B1', 'B2', 'B3']
})
result = pd.merge(left, right, on=['key1', 'key2'], how='inner')
print(result)
Output:
Pandas Concat vs Merge Conclusion
In this article, we explored the differences and use cases of concat()
and merge()
in pandas. While concat()
is generally used for combining data along an axis, merge()
is more suited for complex joins and relational data structures. Understanding when to use each function can significantly enhance your data manipulation capabilities in pandas.
This guide should serve as a starting point for applying these powerful pandas functions to your own data sets.