Pandas Concat Axis
Pandas is a powerful data manipulation library in Python, widely used in data analysis and data science. One of the essential functions in Pandas is concat
, which is used to concatenate pandas objects along a particular axis. In this article, we will explore the concat
function in-depth, focusing on its use with different axes.
Introduction to Pandas Concat
The concat
function in Pandas is primarily used to combine two or more pandas data structures along a particular axis. The function provides flexibility in handling indices and can be used to combine Series, DataFrame, or Panel objects.
Syntax of Concat
The basic syntax of the concat
function is as follows:
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
objs
: This is a sequence or mapping of Series, DataFrame, or Panel objects.axis
: {0/’index’, 1/’columns’}, default 0. The axis to concatenate along.join
: {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es).ignore_index
: boolean, default False. If True, do not use the index values on the concatenation axis.keys
: sequence, default None. If multiple levels passed, should contain tuples.verify_integrity
: boolean, default False. Check whether the new concatenated axis contains duplicates.sort
: boolean, default False. Sort non-concatenation axis if it is not already aligned.
Example 1: Basic Concatenation of DataFrames
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
# Concatenate along the rows
result = pd.concat([df1, df2])
print(result)
Output:
Example 2: Concatenation with Axis=1
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
# Concatenate along the columns
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 3: Handling Indexes with Ignore Index
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[0, 1, 2, 3])
# Concatenate and ignore the index
result = pd.concat([df1, df2], ignore_index=True)
print(result)
Output:
Example 4: Concatenation with Keys
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[4, 5, 6, 7])
# Concatenate with keys
result = pd.concat([df1, df2], keys=['df1', 'df2'])
print(result)
Output:
Example 5: Concatenation with Different Indexes
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[2, 3, 4, 5])
# Concatenate along the columns with different indexes
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 6: Inner Join on Concatenation
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[2, 3, 4, 5])
# Concatenate with inner join
result = pd.concat([df1, df2], join='inner')
print(result)
Output:
Example 7: Concatenation with MultiIndex
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
# Concatenate with MultiIndex
result = pd.concat([df1, df2], keys=['first', 'second'], axis=1)
print(result)
Output:
Example 8: Verifying Integrity on Concatenation
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[0, 1, 2, 3])
# Attempt to concatenate and verify integrity
try:
result = pd.concat([df1, df2], verify_integrity=True)
print(result)
except ValueError as e:
print("ValueError:", e)
Output:
Example 9: Concatenation with Sorting
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'B': ['B0', 'B1', 'B2', 'B3'],
'A': ['A0', 'A1', 'A2', 'A3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'D': ['D0', 'D1', 'D2', 'D3'],
'C': ['C0', 'C1', 'C2', 'C3']
}, index=[0, 1, 2, 3])
# Concatenate and sort columns
result = pd.concat([df1, df2], axis=1, sort=True)
print(result)
Output:
Example 10: Concatenation with Copy
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
# Concatenate without copying data
result = pd.concat([df1, df2], copy=False)
print(result)
Output:
Example 11: Concatenation with Different Column Names
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
# Concatenate along the columns with different column names
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 12: Concatenation with Non-Unique Index
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[2, 3, 4, 5])
# Concatenate with non-unique index
result = pd.concat([df1, df2], ignore_index=False)
print(result)
Output:
Example 13: Concatenation with DataFrame and Series
import pandas as pd
# Create a DataFrame and a Series
df = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
series = pd.Series(['S0', 'S1', 'S2', 'S3'], name='S')
# Concatenate DataFrame and Series along columns
result = pd.concat([df, series], axis=1)
print(result)
Output:
Example 14: Concatenation with Different DataTypes
import pandas as pd
# Create two DataFrames with different data types
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8]
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'C': [1.1, 2.2, 3.3, 4.4],
'D': [5.5, 6.6, 7.7, 8.8]
}, index=[0, 1, 2, 3])
# Concatenate along the columns with different data types
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 15: Concatenation with Multi-Level Columns
import pandas as pd
# Create two DataFrames with multi-level columns
df1 = pd.DataFrame({
('Group1', 'A'): ['A0', 'A1', 'A2', 'A3'],
('Group1', 'B'): ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
('Group2', 'C'): ['C0', 'C1', 'C2', 'C3'],
('Group2', 'D'): ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
# Concatenate along the columns with multi-level columns
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
This extensive exploration of the concat
function in Pandas demonstrates its versatility and power in handling various data manipulation tasks. By understanding and utilizing the different parameters and options available with concat
, you can efficiently combine multiple datasets into a single structure, facilitating easier analysis and manipulation.