Pandas Concat Dataframes

Pandas Concat Dataframes

Pandas is a powerful Python library used for data manipulation and analysis. One of its core functionalities is the ability to concatenate, or combine, multiple DataFrames along a particular axis. This operation is crucial when dealing with large datasets from multiple sources or when you need to perform operations across several structured data sets. This article will explore the pandas.concat() function in-depth, providing a comprehensive guide on its usage with practical examples.

Understanding pandas.concat()

The pandas.concat() function is versatile, allowing for the concatenation of two or more pandas DataFrames or Series along a particular axis, either vertically (axis=0) or horizontally (axis=1). The function provides various parameters to handle different data alignment and concatenation issues, such as handling indexes, managing non-matching column labels, and dealing with missing data.

Syntax of pandas.concat()

The basic syntax of pandas.concat() is as follows:

pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
  • objs: a sequence or mapping of Series or DataFrame objects.
  • axis: {0, 1}, default 0. The axis to concatenate along.
  • join: {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis.
  • ignore_index: boolean, default False. If True, do not use the index values along the concatenation axis.
  • keys: sequence, default None. Construct hierarchical index using the passed keys.
  • levels: list of sequences, default None. Specific levels (unique values) to use for constructing a MultiIndex.
  • names: list, default None. Names for the levels in the resulting hierarchical index.
  • verify_integrity: boolean, default False. Check whether the new concatenated axis contains duplicates.
  • sort: boolean, default False. Sort non-concatenation axis if it is not already aligned.
  • copy: boolean, default True. Copy the data besides concatenating.

Examples of Using pandas.concat()

Example 1: Basic Vertical Concatenation

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7']
}, index=[4, 5, 6, 7])

# Concatenate DataFrames
result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Dataframes

Example 2: Horizontal Concatenation with Different Indices

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])

# Concatenate DataFrames horizontally
result = pd.concat([df1, df2], axis=1)
print(result)

Output:

Pandas Concat Dataframes

Example 3: Concatenation with MultiIndex Keys

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

# Concatenate with keys
result = pd.concat([df1, df2], keys=['pandasdataframe.com1', 'pandasdataframe.com2'])
print(result)

Output:

Pandas Concat Dataframes

Example 4: Ignoring the Index

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

# Concatenate ignoring the index
result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

Pandas Concat Dataframes

Example 5: Concatenation with Different Columns

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
})

# Concatenate DataFrames with different columns
result = pd.concat([df1, df2], sort=True)
print(result)

Output:

Pandas Concat Dataframes

Example 6: Vertical Concatenation with Non-Alignment Handling

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
}, index=[2, 3, 4])

# Concatenate DataFrames with non-aligned indices
result = pd.concat([df1, df2], axis=0, join='outer')
print(result)

Output:

Pandas Concat Dataframes

Example 7: Inner Join on Horizontal Concatenation

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])

# Concatenate DataFrames horizontally with inner join
result = pd.concat([df1, df2], axis=1, join='inner')
print(result)

Output:

Pandas Concat Dataframes

Example 8: Concatenation with Verification of Integrity

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
}, index=[2, 3, 4])

# Concatenate with verification of integrity
result = pd.concat([df1, df2], verify_integrity=True)
print(result)

Example 9: Concatenation with MultiIndex and Custom Names

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

# Concatenate with MultiIndex and custom names
result = pd.concat([df1, df2], keys=['pandasdataframe.com1', 'pandasdataframe.com2'], names=['Source', 'Row ID'])
print(result)

Output:

Pandas Concat Dataframes

Example 10: Handling Missing Data in Concatenation

import pandas as pd

# Create two DataFrames with missing data
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', None, 'B2']
})

df2 = pd.DataFrame({
    'A': [None, 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

# Concatenate DataFrames vertically
result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Dataframes

Example 11: Concatenation with Different Columns and Outer Join

import pandas as pd

# Create two DataFrames with different columns
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
})

# Concatenate DataFrames horizontally with outer join
result = pd.concat([df1, df2], axis=1, join='outer')
print(result)

Output:

Pandas Concat Dataframes

Example 12: Concatenation with Custom Index

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=['pandasdataframe.com0', 'pandasdataframe.com1', 'pandasdataframe.com2'])

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
}, index=['pandasdataframe.com1', 'pandasdataframe.com2', 'pandasdataframe.com3'])

# Concatenate DataFrames horizontally
result = pd.concat([df1, df2], axis=1)
print(result)

Output:

Pandas Concat Dataframes

Example 13: Concatenation with Sorting

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'B': ['B2', 'B3', 'B1'],
    'A': ['A2', 'A3', 'A1']
}, index=[2, 3, 1])

df2 = pd.DataFrame({
    'B': ['B5', 'B4', 'B6'],
    'A': ['A5', 'A4', 'A6']
}, index=[5, 4, 6])

# Concatenate DataFrames vertically with sorting
result = pd.concat([df1, df2], sort=True)
print(result)

Output:

Pandas Concat Dataframes

Example 14: Concatenation with Copy False

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

# Concatenate without copying data
result = pd.concat([df1, df2], copy=False)
print(result)

Output:

Pandas Concat Dataframes

Example 15: Concatenation with Hierarchical Index Using Levels

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

# Concatenate with hierarchical index using levels
result = pd.concat([df1, df2], keys=['Level1', 'Level2'], levels=[['Level1', 'Level2', 'Level3']], names=['Level'])
print(result)

Output:

Pandas Concat Dataframes

Example 16: Concatenation with Non-Unique Indexes

import pandas as pd

# Create two DataFrames with non-unique indexes
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[1, 2, 2])

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
}, index=[2, 3, 3])

# Concatenate DataFrames with non-unique indexes
result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Dataframes

Example 17: Concatenation Using a Mapping

import pandas as pd

# Create a dictionary of DataFrames
data_frames = {
    'df1': pd.DataFrame({
        'A': ['A0', 'A1', 'A2'],
        'B': ['B0', 'B1', 'B2']
    }),
    'df2': pd.DataFrame({
        'A': ['A3', 'A4', 'A5'],
        'B': ['B3', 'B4', 'B5']
    })
}

# Concatenate using a mapping
result = pd.concat(data_frames)
print(result)

Output:

Pandas Concat Dataframes

Example 18: Concatenation with Different Column Orders

import pandas as pd

# Create two DataFrames with different column orders
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'B': ['B3', 'B4', 'B5'],
    'A': ['A3', 'A4', 'A5']
})

# Concatenate DataFrames with different column orders
result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Dataframes

Example 19: Concatenation with All Parameters Used

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

# Concatenate using all parameters
result = pd.concat([df1, df2], axis=0, join='outer', ignore_index=True, keys=['Group1', 'Group2'], levels=None, names=['Group'], verify_integrity=False, sort=False, copy=True)
print(result)

Output:

Pandas Concat Dataframes

Example 20: Concatenation with Empty DataFrame

import pandas as pd

# Create an empty DataFrame and a non-empty DataFrame
df_empty = pd.DataFrame()
df_non_empty = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

# Concatenate an empty DataFrame with a non-empty DataFrame
result = pd.concat([df_empty, df_non_empty])
print(result)

Output:

Pandas Concat Dataframes

These examples illustrate various ways to use the pandas.concat() function to handle different data concatenation scenarios effectively. By understanding and utilizing these examples, you can manage and manipulate large datasets efficiently in your data analysis projects.