Pandas Concat Multiple DataFrames
Concatenating multiple DataFrames is a common task in data analysis, often required when you need to combine similar datasets from different sources. Pandas, a powerful data manipulation library in Python, provides various ways to concatenate DataFrames. This article will explore the concat
function in detail, providing comprehensive examples to illustrate different use cases and options.
Introduction to DataFrame Concatenation
Concatenation refers to the process of appending one or more DataFrames to another, either vertically (row-wise) or horizontally (column-wise). The Pandas library offers the concat
function, which is versatile and can handle various concatenation tasks with ease.
Basic Syntax of pd.concat
The basic syntax of the pd.concat
function is as follows:
import pandas as pd
# pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False)
objs
: a sequence or mapping of Series or DataFrame objects.axis
: {0/’index’, 1/’columns’}, default 0. The axis to concatenate along.join
: {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis.ignore_index
: boolean, default False. If True, do not use the index values along the concatenation axis.keys
: sequence, default None. If multiple levels passed, should contain tuples.levels
: list of sequences, default None. Specific levels (unique values) to use for constructing a MultiIndex.names
: list, default None. Names for the levels in the resulting hierarchical index.verify_integrity
: boolean, default False. Check whether the new concatenated axis contains duplicates.sort
: boolean, default False. Sort non-concatenation axis if it is not already aligned.
Examples of Concatenating DataFrames
Example 1: Basic Vertical Concatenation
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2])
print(result)
Output:
Example 2: Horizontal Concatenation with Different Indices
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 3: Using ignore_index
Option
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
result = pd.concat([df1, df2], ignore_index=True)
print(result)
Output:
Example 4: Concatenation with MultiIndex Using keys
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
result = pd.concat([df1, df2], keys=['Group1', 'Group2'])
print(result)
Output:
Example 5: Inner Join on Columns
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'B': ['B2', 'B3', 'B4'],
'C': ['C2', 'C3', 'C4']
})
result = pd.concat([df1, df2], join='inner')
print(result)
Output:
Example 6: Concatenation with Different Columns and sort
Option
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'B': ['B3', 'B4', 'B5'],
'C': ['C3', 'C4', 'C5']
})
result = pd.concat([df1, df2], sort=True)
print(result)
Output:
Example 7: Concatenation with a Series
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
s1 = pd.Series(['S0', 'S1', 'S2'], name='S')
result = pd.concat([df1, s1], axis=1)
print(result)
Output:
Example 8: Handling Non-Alignment with join_axes
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])
result = pd.concat([df1, df2], axis=1, join='outer')
print(result)
Output:
Example 9: Concatenation with Different Data Types
import pandas as pd
df1 = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6]
})
df2 = pd.DataFrame({
'A': [7.1, 8.2, 9.3],
'B': [10.4, 11.5, 12.6]
})
result = pd.concat([df1, df2])
print(result)
Output:
Pandas Concat Multiple DataFrames Conclusion
Concatenating DataFrames is a fundamental operation in data analysis, allowing analysts to combine data from multiple sources into a single DataFrame for further analysis. The pd.concat
function in Pandas is highly flexible, supporting various options to tailor the concatenation process to specific needs. By understanding and utilizing these options effectively, you can handle most data concatenation scenarios efficiently.