Pandas Concat Multiple DataFrames

Pandas Concat Multiple DataFrames

Concatenating multiple DataFrames is a common task in data analysis, often required when you need to combine similar datasets from different sources. Pandas, a powerful data manipulation library in Python, provides various ways to concatenate DataFrames. This article will explore the concat function in detail, providing comprehensive examples to illustrate different use cases and options.

Introduction to DataFrame Concatenation

Concatenation refers to the process of appending one or more DataFrames to another, either vertically (row-wise) or horizontally (column-wise). The Pandas library offers the concat function, which is versatile and can handle various concatenation tasks with ease.

Basic Syntax of pd.concat

The basic syntax of the pd.concat function is as follows:

import pandas as pd

# pd.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False)
  • objs: a sequence or mapping of Series or DataFrame objects.
  • axis: {0/’index’, 1/’columns’}, default 0. The axis to concatenate along.
  • join: {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis.
  • ignore_index: boolean, default False. If True, do not use the index values along the concatenation axis.
  • keys: sequence, default None. If multiple levels passed, should contain tuples.
  • levels: list of sequences, default None. Specific levels (unique values) to use for constructing a MultiIndex.
  • names: list, default None. Names for the levels in the resulting hierarchical index.
  • verify_integrity: boolean, default False. Check whether the new concatenated axis contains duplicates.
  • sort: boolean, default False. Sort non-concatenation axis if it is not already aligned.

Examples of Concatenating DataFrames

Example 1: Basic Vertical Concatenation

import pandas as pd

df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7']
}, index=[4, 5, 6, 7])

result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Multiple DataFrames

Example 2: Horizontal Concatenation with Different Indices

import pandas as pd

df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])

result = pd.concat([df1, df2], axis=1)
print(result)

Output:

Pandas Concat Multiple DataFrames

Example 3: Using ignore_index Option

import pandas as pd

df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

Pandas Concat Multiple DataFrames

Example 4: Concatenation with MultiIndex Using keys

import pandas as pd

df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

result = pd.concat([df1, df2], keys=['Group1', 'Group2'])
print(result)

Output:

Pandas Concat Multiple DataFrames

Example 5: Inner Join on Columns

import pandas as pd

df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'B': ['B2', 'B3', 'B4'],
    'C': ['C2', 'C3', 'C4']
})

result = pd.concat([df1, df2], join='inner')
print(result)

Output:

Pandas Concat Multiple DataFrames

Example 6: Concatenation with Different Columns and sort Option

import pandas as pd

df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'B': ['B3', 'B4', 'B5'],
    'C': ['C3', 'C4', 'C5']
})

result = pd.concat([df1, df2], sort=True)
print(result)

Output:

Pandas Concat Multiple DataFrames

Example 7: Concatenation with a Series

import pandas as pd

df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

s1 = pd.Series(['S0', 'S1', 'S2'], name='S')

result = pd.concat([df1, s1], axis=1)
print(result)

Output:

Pandas Concat Multiple DataFrames

Example 8: Handling Non-Alignment with join_axes

import pandas as pd

df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])

result = pd.concat([df1, df2], axis=1, join='outer')
print(result)

Output:

Pandas Concat Multiple DataFrames

Example 9: Concatenation with Different Data Types

import pandas as pd

df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
})

df2 = pd.DataFrame({
    'A': [7.1, 8.2, 9.3],
    'B': [10.4, 11.5, 12.6]
})

result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Multiple DataFrames

Pandas Concat Multiple DataFrames Conclusion

Concatenating DataFrames is a fundamental operation in data analysis, allowing analysts to combine data from multiple sources into a single DataFrame for further analysis. The pd.concat function in Pandas is highly flexible, supporting various options to tailor the concatenation process to specific needs. By understanding and utilizing these options effectively, you can handle most data concatenation scenarios efficiently.