Pandas Concat DataFrame
Pandas is a powerful Python library used for data manipulation and analysis. One of the most useful functionalities provided by Pandas is the ability to concatenate DataFrames. Concatenation is the process of combining two or more DataFrames along a particular axis (either rows or columns) to create a new DataFrame. This article will explore the various ways to use the pd.concat()
function to concatenate DataFrames, along with detailed examples.
Understanding pd.concat()
The pd.concat()
function in Pandas is versatile and can be used to combine multiple DataFrames or Series objects horizontally (side by side) or vertically (stacked on top of each other). The basic syntax of pd.concat()
is:
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
objs
: This is a sequence or mapping of Series or DataFrame objects.axis
: {0/’index’, 1/’columns’}, default 0. The axis to concatenate along.join
: {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es).ignore_index
: bool, default False. If True, do not use the index values on the concatenation axis.keys
: sequence, default None. If multiple levels passed, should contain tuples.verify_integrity
: bool, default False. Check whether the new concatenated axis contains duplicates.sort
: bool, default False. Sort non-concatenation axis if it is not already aligned.
Example 1: Basic Vertical Concatenation
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
# Concatenate DataFrames
result = pd.concat([df1, df2])
print(result)
Output:
Example 2: Horizontal Concatenation with Different Indices
import pandas as pd
# Create two DataFrames with different indices
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])
# Concatenate DataFrames along columns
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 3: Concatenation with Inner Join
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
}, index=[2, 3, 4])
# Concatenate with inner join
result = pd.concat([df1, df2], axis=0, join='inner')
print(result)
Output:
Example 4: Concatenation with Keys
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
# Concatenate with keys
result = pd.concat([df1, df2], keys=['x', 'y'])
print(result)
Output:
Example 5: Ignoring the Index
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
# Concatenate ignoring the index
result = pd.concat([df1, df2], ignore_index=True)
print(result)
Output:
Example 6: Adding MultiIndex Keys
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
# Concatenate with MultiIndex keys
result = pd.concat([df1, df2], keys=['first', 'second'], names=['Source', 'Row'])
print(result)
Output:
Example 7: Concatenation with Different Columns
import pandas as pd
# Create two DataFrames with different columns
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
})
# Concatenate DataFrames with different columns
result = pd.concat([df1, df2], sort=False)
print(result)
Output:
Example 8: Verifying Integrity
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
}, index=[2, 3, 4])
# Concatenate with integrity check
try:
result = pd.concat([df1, df2], verify_integrity=True)
print(result)
except ValueError as e:
print("ValueError:", e)
Output:
Example 9: Concatenation with No Copy
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
# Concatenate without copying data
result = pd.concat([df1, df2], copy=False)
print(result)
Output:
Example 10: Handling Non-Alignment with reindex
import pandas as pd
# Create two DataFrames with non-aligned indices
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])
# Reindex df2 before concatenation
df2_reindexed = df2.reindex(df1.index)
# Concatenate DataFrames along columns
result = pd.concat([df1, df2_reindexed], axis=1)
print(result)
Output:
Example 11: Concatenation with Different Column Orders
import pandas as pd
# Create two DataFrames with different column orders
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'B': ['B3', 'B4', 'B5'],
'A': ['A3', 'A4', 'A5']
})
# Concatenate DataFrames
result = pd.concat([df1, df2])
print(result)
Output:
Example 12: Using Append for Concatenation
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
# Use append to concatenate DataFrames
result = df1.append(df2)
print(result)
Example 13: Concatenation with Series
import pandas as pd
# Create a DataFrame and a Series
df = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
series = pd.Series(['S0', 'S1', 'S2'], name='S')
# Concatenate DataFrame with Series
result = pd.concat([df, series], axis=1)
print(result)
Output:
Example 14: Handling NaN Values After Concatenation
import pandas as pd
# Create two DataFrames with non-aligned indices
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
}, index=[2, 3, 4])
# Concatenate DataFrames along columns
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 15: Concatenation with Custom Index
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}, index=['first', 'second', 'third'])
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
}, index=['second', 'third', 'fourth'])
# Concatenate DataFrames along rows
result = pd.concat([df1, df2], axis=0, sort=False)
print(result)
Output:
Example 16: Concatenation with DataFrames Having MultiIndex
import pandas as pd
# Create two DataFrames with MultiIndex
index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ('two', 'a'), ('two', 'b')])
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=index)
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=index)
# Concatenate DataFrames along columns
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 17: Concatenation with Different DataTypes
import pandas as pd
# Create two DataFrames with different data types
df1 = pd.DataFrame({
'A': [1, 2, 3],
'B': [4.0, 5.0, 6.0]
})
df2 = pd.DataFrame({
'A': ['one', 'two', 'three'],
'B': ['four', 'five', 'six']
})
# Concatenate DataFrames
result = pd.concat([df1, df2])
print(result)
Output:
Example 18: Concatenation Using concat
with Dictionary
import pandas as pd
# Create a dictionary of DataFrames
dfs = {
'df1': pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
}),
'df2': pd.DataFrame({
'A': ['A3', 'A4', 'A5'],
'B': ['B3', 'B4', 'B5']
})
}
# Concatenate DataFrames from dictionary
result = pd.concat(dfs)
print(result)
Output:
Example 19: Concatenation with Axis Labels
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2'],
'D': ['D0', 'D1', 'D2']
})
# Concatenate DataFrames with axis labels
result = pd.concat([df1, df2], axis=1, labels=['Group1', 'Group2'])
print(result)
Example 20: Concatenation with Mixed Dimensions
import pandas as pd
# Create a DataFrame and a Series
df = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
series = pd.Series(['S0', 'S1', 'S2', 'S3'], name='S')
# Concatenate DataFrame with Series, treating Series as a DataFrame
result = pd.concat([df, series.to_frame()], axis=1)
print(result)
Output:
These examples provide a comprehensive guide on how to use the pd.concat()
function in various scenarios to concatenate DataFrames in Pandas. Each example is designed to be self-contained and directly runnable, providing clear insights into the capabilities and flexibility of DataFrame concatenation in Pandas.