Pandas Concat 2 dataframes
Pandas is a powerful Python library for data manipulation and analysis. One of its core functionalities is the ability to concatenate, or “concat”, multiple dataframes into one. This feature is incredibly useful when dealing with separate datasets that need to be analyzed together. In this article, we will explore various ways to concatenate two dataframes using the pandas.concat()
function, providing detailed examples and explanations.
Introduction to pandas.concat()
The pandas.concat()
function is versatile, allowing for both simple and complex concatenations. The basic syntax of pandas.concat()
is:
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
Where:
– objs
: a sequence or mapping of Series or DataFrame objects.
– axis
: {0/’index’, 1/’columns’}, default 0. The axis to concatenate along.
– join
: {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis.
– ignore_index
: boolean, default False. If True, do not use the index values along the concatenation axis.
– keys
: sequence, default None. If multiple levels passed, should contain tuples.
– verify_integrity
: boolean, default False. Check whether the new concatenated axis contains duplicates.
– sort
: boolean, default False. Sort non-concatenation axis if it is not already aligned.
Basic Concatenation of Two DataFrames
Let’s start with a simple example where we concatenate two dataframes vertically and horizontally.
Example 1: Vertical Concatenation
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2])
print(result)
Output:
Example 2: Horizontal Concatenation
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Handling Indexes in Concatenation
When concatenating dataframes, handling indexes correctly is crucial to avoid data misalignment.
Example 3: Ignoring the Index
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2], ignore_index=True)
print(result)
Output:
Example 4: Adding Multi-level Index
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2], keys=['x', 'y'])
print(result)
Output:
Concatenation with Different Column Names
Sometimes, dataframes might not have the same column names. Here’s how to handle such cases.
Example 5: Concatenation with Non-Matching Columns
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
df3 = pd.DataFrame({
'E': ['E0', 'E1', 'E2', 'E3'],
'F': ['F0', 'F1', 'F2', 'F3'],
'G': ['G0', 'G1', 'G2', 'G3'],
'H': ['H0', 'H1', 'H2', 'H3']
}, index=[0, 1, 2, 3])
result = pd.concat([df1, df3], sort=False)
print(result)
Output:
Advanced Concatenation Techniques
Example 6: Using Inner Join
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
df3 = pd.DataFrame({
'E': ['E0', 'E1', 'E2', 'E3'],
'F': ['F0', 'F1', 'F2', 'F3'],
'G': ['G0', 'G1', 'G2', 'G3'],
'H': ['H0', 'H1', 'H2', 'H3']
}, index=[0, 1, 2, 3])
result = pd.concat([df1, df3], join='inner')
print(result)
Output:
Example 7: Concatenation with Custom Index
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
df3 = pd.DataFrame({
'E': ['E0', 'E1', 'E2', 'E3'],
'F': ['F0', 'F1', 'F2', 'F3'],
'G': ['G0', 'G1', 'G2', 'G3'],
'H': ['H0', 'H1', 'H2', 'H3']
}, index=[0, 1, 2, 3])
df4 = pd.DataFrame({
'A': ['A8', 'A9', 'A10', 'A11'],
'B': ['B8', 'B9', 'B10', 'B11'],
'C': ['C8', 'C9', 'C10', 'C11'],
'D': ['D8', 'D9', 'D10', 'D11']
}, index=[8, 9, 10, 11])
result = pd.concat([df1, df4], ignore_index=True)
print(result)
Output:
Example 8: Verifying Integrity
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
df3 = pd.DataFrame({
'E': ['E0', 'E1', 'E2', 'E3'],
'F': ['F0', 'F1', 'F2', 'F3'],
'G': ['G0', 'G1', 'G2', 'G3'],
'H': ['H0', 'H1', 'H2', 'H3']
}, index=[0, 1, 2, 3])
df4 = pd.DataFrame({
'A': ['A8', 'A9', 'A10', 'A11'],
'B': ['B8', 'B9', 'B10', 'B11'],
'C': ['C8', 'C9', 'C10', 'C11'],
'D': ['D8', 'D9', 'D10', 'D11']
}, index=[8, 9, 10, 11])
try:
result = pd.concat([df1, df1], verify_integrity=True)
except ValueError as e:
print("Error:", e)
Output:
Example 9: Concatenation with Different Data Types
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
df3 = pd.DataFrame({
'E': ['E0', 'E1', 'E2', 'E3'],
'F': ['F0', 'F1', 'F2', 'F3'],
'G': ['G0', 'G1', 'G2', 'G3'],
'H': ['H0', 'H1', 'H2', 'H3']
}, index=[0, 1, 2, 3])
df4 = pd.DataFrame({
'A': ['A8', 'A9', 'A10', 'A11'],
'B': ['B8', 'B9', 'B10', 'B11'],
'C': ['C8', 'C9', 'C10', 'C11'],
'D': ['D8', 'D9', 'D10', 'D11']
}, index=[8, 9, 10, 11])
df5 = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [1.1, 2.2, 3.3, 4.4],
'C': [True, False, True, False],
'D': pd.to_datetime(['2021-01-01', '2021-02-01', '2021-03-01', '2021-04-01'])
})
result = pd.concat([df1, df5])
print(result)
Output:
Example 10: Concatenation with Different Index Sets
import pandas as pd
# Create two dataframes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
df3 = pd.DataFrame({
'E': ['E0', 'E1', 'E2', 'E3'],
'F': ['F0', 'F1', 'F2', 'F3'],
'G': ['G0', 'G1', 'G2', 'G3'],
'H': ['H0', 'H1', 'H2', 'H3']
}, index=[0, 1, 2, 3])
df4 = pd.DataFrame({
'A': ['A8', 'A9', 'A10', 'A11'],
'B': ['B8', 'B9', 'B10', 'B11'],
'C': ['C8', 'C9', 'C10', 'C11'],
'D': ['D8', 'D9', 'D10', 'D11']
}, index=[8, 9, 10, 11])
df5 = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [1.1, 2.2, 3.3, 4.4],
'C': [True, False, True, False],
'D': pd.to_datetime(['2021-01-01', '2021-02-01', '2021-03-01', '2021-04-01'])
})
df6 = pd.DataFrame({
'A': ['A12', 'A13', 'A14', 'A15'],
'B': ['B12', 'B13', 'B14', 'B15'],
'C': ['C12', 'C13', 'C14', 'C15'],
'D': ['D12', 'D13', 'D14', 'D15']
}, index=[12, 13, 14, 15])
result = pd.concat([df1, df6], axis=1)
print(result)
Output:
Pandas Concat 2 dataframes Conclusion
Concatenating dataframes is a fundamental aspect of data manipulation and analysis using Pandas. Whether you are merging large datasets or combining smaller pieces of data, understanding how to use pandas.concat()
effectively is essential for any data scientist or analyst. This article has covered a variety of scenarios to equip you with the knowledge to handle different data concatenation needs effectively.