Pandas Concat Two DataFrames Vertically

Pandas Concat Two DataFrames Vertically

In this article, we will explore how to concatenate two DataFrames vertically using the pandas library in Python. Pandas is a powerful tool for data manipulation and analysis, providing data structures and operations for manipulating numerical tables and time series. Concatenating DataFrames is a common operation in data preprocessing, merging, and transformation tasks.

Introduction to Concatenation

Concatenation refers to the process of appending one DataFrame below another. This is particularly useful when you have data in similar formats spread across multiple DataFrames and you need to analyze them as a single unit. Pandas provides various functions for performing concatenation, but the primary function used for this purpose is pd.concat().

Using pd.concat() for Vertical Concatenation

The pd.concat() function is versatile and can be used not only for vertical concatenation but also for horizontal concatenation (side by side). The key parameter to control the direction of concatenation is axis. For vertical concatenation, axis is set to 0, which is also the default value.

Example 1: Basic Vertical Concatenation

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7'],
    'C': ['C4', 'C5', 'C6', 'C7'],
    'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])

result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Example 2: Concatenating with Non-Identical Columns

import pandas as pd

# Create two DataFrames with different columns
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])

df2 = pd.DataFrame({
    'C': ['C4', 'C5', 'C6', 'C7'],
    'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])

result = pd.concat([df1, df2], sort=False)
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Example 3: Ignoring the Index During Concatenation

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7']
}, index=[4, 5, 6, 7])

result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Example 4: Adding Multi-level Index on Concatenation

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
})

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7']
})

result = pd.concat([df1, df2], keys=['df1', 'df2'])
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Example 5: Concatenation with Different Indexes

import pandas as pd

# Create two DataFrames with different indexes
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7']
}, index=[10, 11, 12, 13])

result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Example 6: Using append() for Vertical Concatenation

While pd.concat() is the more versatile function, pandas also offers a simpler append() method for DataFrame objects. This method is specifically designed for concatenating along the rows (i.e., vertically).

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
})

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7']
})

result = df1._append(df2, ignore_index=True)
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Example 7: Handling Duplicate Indexes

When concatenating DataFrames with duplicate indexes, pandas will retain all index values, potentially resulting in duplicate index values in the resulting DataFrame. This might not be a problem, but if a unique index is required, additional steps such as resetting the index may be necessary.

import pandas as pd

# Create two DataFrames with duplicate indexes
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7']
}, index=[2, 3, 4, 5])

result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Example 8: Concatenating with Mixed Data Types

When concatenating DataFrames containing columns with different data types, pandas will automatically convert data types if necessary (known as upcasting).

import pandas as pd

# Create two Dataframes with mixed data types
df1 = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [1.1, 2.2, 3.3, 4.4]
})

df2 = pd.DataFrame({
    'A': ['five', 'six', 'seven', 'eight'],
    'B': [5.5, 6.6, 7.7, 8.8]
})

result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Example 9: Concatenating with Data Alignment

When the DataFrames being concatenated do not have the same set of columns, pandas will align columns by name and introduce NaN values for missing data in any of the DataFrames.

import pandas as pd

# Create two DataFrames with different columns
df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']
})

df2 = pd.DataFrame({
    'B': ['B4', 'B5', 'B6', 'B7'],
    'C': ['C4', 'C5', 'C6', 'C7']
})

result = pd.concat([df1, df2], sort=False)
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Example 10: Using Concatenation with Real-World Data

In real-world scenarios, data often comes in parts from different sources. Concatenating these parts vertically can help in creating a unified dataset for analysis.

import pandas as pd

# Simulate loading data from different sources
data_part1 = pd.DataFrame({
    'CustomerID': [1, 2, 3, 4],
    'Product': ['Product1', 'Product2', 'Product3', 'Product4']
})

data_part2 = pd.DataFrame({
    'CustomerID': [5, 6, 7, 8],
    'Product': ['Product5', 'Product6', 'Product7', 'Product8']
})

combined_data = pd.concat([data_part1, data_part2])
print(combined_data)

Output:

Pandas Concat Two DataFrames Vertically

Example 11: Concatenation with Category Data Type

When dealing with categorical data, it’s important to ensure that the category types are consistent across DataFrames to avoid data type conflicts during concatenation.

import pandas as pd

# Create categorical data
df1 = pd.DataFrame({
    'Grade': pd.Categorical(['A', 'B', 'C', 'D'])
})

df2 = pd.DataFrame({
    'Grade': pd.Categorical(['E', 'F'])
})

result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Example 12: Concatenation and Memory Usage

Concatenation can significantly increase memory usage, especially with large DataFrames. It’s important to monitor memory usage during these operations.

import pandas as pd

# Create large DataFrames
df1 = pd.DataFrame({
    'Data': range(100000)
})

df2 = pd.DataFrame({
    'Data': range(100000, 200000)
})

result = pd.concat([df1, df2])
print(result.memory_usage())

Output:

Pandas Concat Two DataFrames Vertically

Example 13: Concatenation with Date Ranges

Concatenating DataFrames that include date ranges can be useful for time series analysis.

import pandas as pd

# Create DataFrames with date ranges
date_range1 = pd.date_range('2023-01-01', periods=10, freq='D')
date_range2 = pd.date_range('2023-02-01', periods=10, freq='D')

df1 = pd.DataFrame({
    'Date': date_range1,
    'Value': range(10)
})

df2 = pd.DataFrame({
    'Date': date_range2,
    'Value': range(10, 20)
})

result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Example 14: Concatenation with Different Languages

Handling DataFrames with text data in different languages can be challenging. Ensuring proper encoding and alignment is crucial.

import pandas as pd

# Create DataFrames with text in different languages
df1 = pd.DataFrame({
    'Text': ['Hello', 'World']
})

df2 = pd.DataFrame({
    'Text': ['こんにちは', '世界']
})

result = pd.concat([df1, df2])
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Example 15: Advanced Concatenation with Custom Functions

Sometimes, you might need to apply custom functions to data during concatenation. This can be achieved using the apply() method after concatenation.

import pandas as pd

# Define a custom function to modify data
def add_suffix(series):
    return series.apply(lambda x: f"{x}_suffix")

# Create DataFrames
df1 = pd.DataFrame({
    'Data': ['A', 'B', 'C']
})

df2 = pd.DataFrame({
    'Data': ['D', 'E', 'F']
})

result = pd.concat([df1, df2]).apply(add_suffix)
print(result)

Output:

Pandas Concat Two DataFrames Vertically

Pandas Concat Two DataFrames Vertically Conclusion

Concatenating DataFrames vertically is a fundamental operation in data manipulation with pandas. It allows for the integration of data from multiple sources into a single DataFrame, facilitating easier analysis and manipulation. Understanding how to effectively use pd.concat() and related functions is essential for any data scientist or analyst working with Python and pandas.