Pandas Concat DataFrame

Pandas Concat DataFrame

Pandas is a powerful Python library used for data manipulation and analysis. One of the most useful functionalities provided by Pandas is the ability to concatenate DataFrames. Concatenation is the process of combining two or more DataFrames along a particular axis (either rows or columns) to create a new DataFrame. This article will explore the various ways to use the pd.concat() function to concatenate DataFrames, along with detailed examples.

Understanding pd.concat()

The pd.concat() function in Pandas is versatile and can be used to combine multiple DataFrames or Series objects horizontally (side by side) or vertically (stacked on top of each other). The basic syntax of pd.concat() is:

pandas.concat(objs, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)
  • objs: This is a sequence or mapping of Series or DataFrame objects.
  • axis: {0/’index’, 1/’columns’}, default 0. The axis to concatenate along.
  • join: {‘inner’, ‘outer’}, default ‘outer’. How to handle indexes on other axis(es).
  • ignore_index: bool, default False. If True, do not use the index values on the concatenation axis.
  • keys: sequence, default None. If multiple levels passed, should contain tuples.
  • verify_integrity: bool, default False. Check whether the new concatenated axis contains duplicates.
  • sort: bool, default False. Sort non-concatenation axis if it is not already aligned.

Example 1: Basic Vertical Concatenation

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7'],
    'C': ['C4', 'C5', 'C6', 'C7'],
    'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])

# Concatenate DataFrames
result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat DataFrame

Example 2: Horizontal Concatenation with Different Indices

import pandas as pd

# Create two DataFrames with different indices
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])

# Concatenate DataFrames along columns
result = pd.concat([df1, df2], axis=1)
print(result)

Output:

Pandas Concat DataFrame

Example 3: Concatenation with Inner Join

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
}, index=[2, 3, 4])

# Concatenate with inner join
result = pd.concat([df1, df2], axis=0, join='inner')
print(result)

Output:

Pandas Concat DataFrame

Example 4: Concatenation with Keys

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

# Concatenate with keys
result = pd.concat([df1, df2], keys=['x', 'y'])
print(result)

Output:

Pandas Concat DataFrame

Example 5: Ignoring the Index

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

# Concatenate ignoring the index
result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

Pandas Concat DataFrame

Example 6: Adding MultiIndex Keys

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

# Concatenate with MultiIndex keys
result = pd.concat([df1, df2], keys=['first', 'second'], names=['Source', 'Row'])
print(result)

Output:

Pandas Concat DataFrame

Example 7: Concatenation with Different Columns

import pandas as pd

# Create two DataFrames with different columns
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
})

# Concatenate DataFrames with different columns
result = pd.concat([df1, df2], sort=False)
print(result)

Output:

Pandas Concat DataFrame

Example 8: Verifying Integrity

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
}, index=[2, 3, 4])

# Concatenate with integrity check
try:
    result = pd.concat([df1, df2], verify_integrity=True)
    print(result)
except ValueError as e:
    print("ValueError:", e)

Output:

Pandas Concat DataFrame

Example 9: Concatenation with No Copy

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

# Concatenate without copying data
result = pd.concat([df1, df2], copy=False)
print(result)

Output:

Pandas Concat DataFrame

Example 10: Handling Non-Alignment with reindex

import pandas as pd

# Create two DataFrames with non-aligned indices
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
}, index=[1, 2, 3])

# Reindex df2 before concatenation
df2_reindexed = df2.reindex(df1.index)

# Concatenate DataFrames along columns
result = pd.concat([df1, df2_reindexed], axis=1)
print(result)

Output:

Pandas Concat DataFrame

Example 11: Concatenation with Different Column Orders

import pandas as pd

# Create two DataFrames with different column orders
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'B': ['B3', 'B4', 'B5'],
    'A': ['A3', 'A4', 'A5']
})

# Concatenate DataFrames
result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat DataFrame

Example 12: Using Append for Concatenation

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'A': ['A3', 'A4', 'A5'],
    'B': ['B3', 'B4', 'B5']
})

# Use append to concatenate DataFrames
result = df1.append(df2)
print(result)

Example 13: Concatenation with Series

import pandas as pd

# Create a DataFrame and a Series
df = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

series = pd.Series(['S0', 'S1', 'S2'], name='S')

# Concatenate DataFrame with Series
result = pd.concat([df, series], axis=1)
print(result)

Output:

Pandas Concat DataFrame

Example 14: Handling NaN Values After Concatenation

import pandas as pd

# Create two DataFrames with non-aligned indices
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=[0, 1, 2])

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
}, index=[2, 3, 4])

# Concatenate DataFrames along columns
result = pd.concat([df1, df2], axis=1)
print(result)

Output:

Pandas Concat DataFrame

Example 15: Concatenation with Custom Index

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
}, index=['first', 'second', 'third'])

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
}, index=['second', 'third', 'fourth'])

# Concatenate DataFrames along rows
result = pd.concat([df1, df2], axis=0, sort=False)
print(result)

Output:

Pandas Concat DataFrame

Example 16: Concatenation with DataFrames Having MultiIndex

import pandas as pd

# Create two DataFrames with MultiIndex
index = pd.MultiIndex.from_tuples([('one', 'a'), ('one', 'b'), ('two', 'a'), ('two', 'b')])
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
}, index=index)

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']
}, index=index)

# Concatenate DataFrames along columns
result = pd.concat([df1, df2], axis=1)
print(result)

Output:

Pandas Concat DataFrame

Example 17: Concatenation with Different DataTypes

import pandas as pd

# Create two DataFrames with different data types
df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4.0, 5.0, 6.0]
})

df2 = pd.DataFrame({
    'A': ['one', 'two', 'three'],
    'B': ['four', 'five', 'six']
})

# Concatenate DataFrames
result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat DataFrame

Example 18: Concatenation Using concat with Dictionary

import pandas as pd

# Create a dictionary of DataFrames
dfs = {
    'df1': pd.DataFrame({
        'A': ['A0', 'A1', 'A2'],
        'B': ['B0', 'B1', 'B2']
    }),
    'df2': pd.DataFrame({
        'A': ['A3', 'A4', 'A5'],
        'B': ['B3', 'B4', 'B5']
    })
}

# Concatenate DataFrames from dictionary
result = pd.concat(dfs)
print(result)

Output:

Pandas Concat DataFrame

Example 19: Concatenation with Axis Labels

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

df2 = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']
})

# Concatenate DataFrames with axis labels
result = pd.concat([df1, df2], axis=1, labels=['Group1', 'Group2'])
print(result)

Example 20: Concatenation with Mixed Dimensions

import pandas as pd

# Create a DataFrame and a Series
df = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']
})

series = pd.Series(['S0', 'S1', 'S2', 'S3'], name='S')

# Concatenate DataFrame with Series, treating Series as a DataFrame
result = pd.concat([df, series.to_frame()], axis=1)
print(result)

Output:

Pandas Concat DataFrame

These examples provide a comprehensive guide on how to use the pd.concat() function in various scenarios to concatenate DataFrames in Pandas. Each example is designed to be self-contained and directly runnable, providing clear insights into the capabilities and flexibility of DataFrame concatenation in Pandas.