Pandas Concat Dataframes
Pandas is a powerful Python library used for data manipulation and analysis. One of its core functionalities is the ability to concatenate, or combine, multiple DataFrames along a particular axis. This operation is crucial when dealing with large datasets from multiple sources or when you need to perform operations across several structured data sets. This article will explore the pandas.concat()
function in-depth, providing a comprehensive guide on its usage with practical examples.
Understanding pandas.concat()
The pandas.concat()
function is versatile, allowing for the concatenation of two or more pandas DataFrames or Series along a particular axis, either vertically (axis=0) or horizontally (axis=1). The function provides various parameters to handle different data alignment and concatenation issues, such as handling indexes, managing non-matching column labels, and dealing with missing data.
Syntax of pandas.concat()
The basic syntax of pandas.concat()
is as follows:
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
objs
: a sequence or mapping of Series or DataFrame objects.axis
: {0, 1}, default 0. The axis to concatenate along.join
: {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis.ignore_index
: boolean, default False. If True, do not use the index values along the concatenation axis.keys
: sequence, default None. Construct hierarchical index using the passed keys.levels
: list of sequences, default None. Specific levels (unique values) to use for constructing a MultiIndex.names
: list, default None. Names for the levels in the resulting hierarchical index.verify_integrity
: boolean, default False. Check whether the new concatenated axis contains duplicates.sort
: boolean, default False. Sort non-concatenation axis if it is not already aligned.copy
: boolean, default True. Copy the data besides concatenating.
Examples of Using pandas.concat()
Example 1: Basic Vertical Concatenation
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[4, 5, 6, 7])
# Concatenate DataFrames
result = pd.concat([df1, df2])
print(result)
Output:
Example 2: Horizontal Concatenation with Different Indices
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])
# Concatenate DataFrames horizontally
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 3: Concatenation with MultiIndex Keys
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
# Concatenate with keys
result = pd.concat([df1, df2], keys=['pandasdataframe.com1', 'pandasdataframe.com2'])
print(result)
Output:
Example 4: Ignoring the Index
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
# Concatenate ignoring the index
result = pd.concat([df1, df2], ignore_index=True)
print(result)
Output:
Example 5: Concatenation with Different Columns
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
})
# Concatenate DataFrames with different columns
result = pd.concat([df1, df2], sort=True)
print(result)
Output:
Example 6: Vertical Concatenation with Non-Alignment Handling
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
}, index=[2, 3, 4])
# Concatenate DataFrames with non-aligned indices
result = pd.concat([df1, df2], axis=0, join='outer')
print(result)
Output:
Example 7: Inner Join on Horizontal Concatenation
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])
# Concatenate DataFrames horizontally with inner join
result = pd.concat([df1, df2], axis=1, join='inner')
print(result)
Output:
Example 8: Concatenation with Verification of Integrity
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
}, index=[2, 3, 4])
# Concatenate with verification of integrity
result = pd.concat([df1, df2], verify_integrity=True)
print(result)
Example 9: Concatenation with MultiIndex and Custom Names
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
# Concatenate with MultiIndex and custom names
result = pd.concat([df1, df2], keys=['pandasdataframe.com1', 'pandasdataframe.com2'], names=['Source', 'Row ID'])
print(result)
Output:
Example 10: Handling Missing Data in Concatenation
import pandas as pd
# Create two DataFrames with missing data
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', None, 'B2']
})
df2 = pd.DataFrame({
'A': [None, 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
# Concatenate DataFrames vertically
result = pd.concat([df1, df2])
print(result)
Output:
Example 11: Concatenation with Different Columns and Outer Join
import pandas as pd
# Create two DataFrames with different columns
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
})
# Concatenate DataFrames horizontally with outer join
result = pd.concat([df1, df2], axis=1, join='outer')
print(result)
Output:
Example 12: Concatenation with Custom Index
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=['pandasdataframe.com0', 'pandasdataframe.com1', 'pandasdataframe.com2'])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
}, index=['pandasdataframe.com1', 'pandasdataframe.com2', 'pandasdataframe.com3'])
# Concatenate DataFrames horizontally
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 13: Concatenation with Sorting
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'B': ['B2', 'B3', 'B1'],
'A': ['A2', 'A3', 'A1']
}, index=[2, 3, 1])
df2 = pd.DataFrame({
'B': ['B5', 'B4', 'B6'],
'A': ['A5', 'A4', 'A6']
}, index=[5, 4, 6])
# Concatenate DataFrames vertically with sorting
result = pd.concat([df1, df2], sort=True)
print(result)
Output:
Example 14: Concatenation with Copy False
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
# Concatenate without copying data
result = pd.concat([df1, df2], copy=False)
print(result)
Output:
Example 15: Concatenation with Hierarchical Index Using Levels
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
# Concatenate with hierarchical index using levels
result = pd.concat([df1, df2], keys=['Level1', 'Level2'], levels=[['Level1', 'Level2', 'Level3']], names=['Level'])
print(result)
Output:
Example 16: Concatenation with Non-Unique Indexes
import pandas as pd
# Create two DataFrames with non-unique indexes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[1, 2, 2])
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
}, index=[2, 3, 3])
# Concatenate DataFrames with non-unique indexes
result = pd.concat([df1, df2])
print(result)
Output:
Example 17: Concatenation Using a Mapping
import pandas as pd
# Create a dictionary of DataFrames
data_frames = {
'df1': pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}),
'df2': pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
}
# Concatenate using a mapping
result = pd.concat(data_frames)
print(result)
Output:
Example 18: Concatenation with Different Column Orders
import pandas as pd
# Create two DataFrames with different column orders
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'B': ['B3', 'B4', 'B5'],
'A': ['A3', 'A4', 'A5']
})
# Concatenate DataFrames with different column orders
result = pd.concat([df1, df2])
print(result)
Output:
Example 19: Concatenation with All Parameters Used
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
# Concatenate using all parameters
result = pd.concat([df1, df2], axis=0, join='outer', ignore_index=True, keys=['Group1', 'Group2'], levels=None, names=['Group'], verify_integrity=False, sort=False, copy=True)
print(result)
Output:
Example 20: Concatenation with Empty DataFrame
import pandas as pd
# Create an empty DataFrame and a non-empty DataFrame
df_empty = pd.DataFrame()
df_non_empty = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
# Concatenate an empty DataFrame with a non-empty DataFrame
result = pd.concat([df_empty, df_non_empty])
print(result)
Output:
These examples illustrate various ways to use the pandas.concat()
function to handle different data concatenation scenarios effectively. By understanding and utilizing these examples, you can manage and manipulate large datasets efficiently in your data analysis projects.