Pandas Concat
Pandas is a powerful data manipulation library in Python, widely used in data analysis and data science. One of the essential functions provided by pandas is concat
, which is used to concatenate pandas objects along a particular axis with optional set logic along the other axes. This function can concatenate Series, DataFrame, or Panel objects.
This article will explore the concat
function in detail, providing a comprehensive guide on its usage with various examples. Each example will be standalone, ensuring that you can run them independently without any dependencies on previous code snippets.
Understanding Pandas Concat Function
The concat
function in pandas is primarily used to combine data from different DataFrame or Series objects into a single DataFrame. The syntax for the concat
function is:
pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
- objs: This is a sequence or mapping of Series or DataFrame objects.
- axis: The axis to concatenate along.
- join: How to handle indexes on other axis(es).
- ignore_index: If True, do not use the index values on the concatenation axis.
- keys: Construct hierarchical index using the passed keys.
- levels: Specific levels (unique values) to use for constructing a MultiIndex.
- names: Names for the levels in the resulting hierarchical index.
- verify_integrity: Check whether the new concatenated axis contains duplicates.
- sort: Sort non-concatenation axis if it is not already aligned.
- copy: If False, do not copy data unnecessarily.
Examples of Using Pandas Concat
Example 1: Basic Concatenation of Two DataFrames
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2])
print(result)
Output:
Example 2: Concatenation with Axis Set to 1
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
})
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
Example 3: Ignoring the Index
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
})
result = pd.concat([df1, df2], ignore_index=True)
print(result)
Output:
Example 4: Adding MultiIndex Keys
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
})
result = pd.concat([df1, df2], keys=['df1', 'df2'])
print(result)
Output:
Example 5: Concatenation with Different Columns
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7']
})
result = pd.concat([df1, df2], sort=False)
print(result)
Output:
Example 6: Using join
Parameter
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7']
})
result = pd.concat([df1, df2], join='inner')
print(result)
Output:
Example 7: Concatenation with Different Indexes
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2])
print(result)
Output:
Example 8: Concatenation with Non-Overlapping Indexes
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[8, 9, 10, 11])
result = pd.concat([df1, df2])
print(result)
Output:
Example 9: Concatenation with Hierarchical Index
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2], keys=['Group1', 'Group2'])
print(result)
Output:
Example 10: Concatenation with Mixed DataTypes
import pandas as pd
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'A': [5, 6, 7, 8],
'B': ['B4', 'B5', 'B6', 'B7']
})
result = pd.concat([df1, df2])
print(result)
Output:
Example 11: Concatenation and Retaining the Original Index
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
})
result = pd.concat([df1, df2], ignore_index=False)
print(result)
Output:
Example 12: Concatenation with DataFrames Having Different Shapes
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2'],
'B': ['B0', 'B1', 'B2']
})
df2 = pd.DataFrame({
'A': ['A3', 'A4', 'A5', 'A6'],
'B': ['B3', 'B4', 'B5', 'B6'],
'C': ['C3', 'C4', 'C5', 'C6']
})
result = pd.concat([df1, df2], sort=False)
print(result)
Output:
Example 13: Concatenation Using sort
Parameter
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'C': ['C0', 'C1', 'C2', 'C3']
})
df2 = pd.DataFrame({
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7']
})
result = pd.concat([df1, df2], sort=True)
print(result)
Output:
Example 14: Concatenation with copy
Parameter Set to False
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
})
result = pd.concat([df1, df2], copy=False)
print(result)
Output:
Example 15: Concatenation with Multiple DataFrames
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
})
df3 = pd.DataFrame({
'A': ['A8', 'A9', 'A10', 'A11'],
'B': ['B8', 'B9', 'B10', 'B11']
})
result = pd.concat([df1, df2, df3])
print(result)
Output:
Example 16: Concatenation with Different Column Orders
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'B': ['B4', 'B5', 'B6', 'B7'],
'A': ['A4', 'A5', 'A6', 'A7']
})
result = pd.concat([df1, df2])
print(result)
Output:
Example 17: Concatenation with DataFrame and Series
import pandas as pd
df = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
s = pd.Series(['S1', 'S2', 'S3', 'S4'], name='S')
result = pd.concat([df, s], axis=1)
print(result)
Output:
Example 18: Concatenation with Handling of NaN Values
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
})
result = pd.concat([df1, df2], sort=False)
print(result)
Output:
Example 19: Concatenation with Custom Index Names
import pandas as pd
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=['pandasdataframe.com1', 'pandasdataframe.com2', 'pandasdataframe.com3', 'pandasdataframe.com4'])
df2 = pd.DataFrame({
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=['pandasdataframe.com5', 'pandasdataframe.com6', 'pandasdataframe.com7', 'pandasdataframe.com8'])
result = pd.concat([df1, df2], axis=1)
print(result)
Output:
These examples cover a wide range of scenarios where the Pandas concat function can be used effectively to manipulate and combine data in pandas. Each example is designed to be self-contained and executable, providing a practical understanding of how to use concat
in different contexts.