Pandas Concat Two DataFrames Vertically
In this article, we will explore how to concatenate two DataFrames vertically using the pandas library in Python. Pandas is a powerful tool for data manipulation and analysis, providing data structures and operations for manipulating numerical tables and time series. Concatenating DataFrames is a common operation in data preprocessing, merging, and transformation tasks.
Introduction to Concatenation
Concatenation refers to the process of appending one DataFrame below another. This is particularly useful when you have data in similar formats spread across multiple DataFrames and you need to analyze them as a single unit. Pandas provides various functions for performing concatenation, but the primary function used for this purpose is pd.concat()
.
Using pd.concat()
for Vertical Concatenation
The pd.concat()
function is versatile and can be used not only for vertical concatenation but also for horizontal concatenation (side by side). The key parameter to control the direction of concatenation is axis
. For vertical concatenation, axis
is set to 0, which is also the default value.
Example 1: Basic Vertical Concatenation
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2])
print(result)
Output:
Example 2: Concatenating with Non-Identical Columns
import pandas as pd
# Create two DataFrames with different columns
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'C': ['C4', 'C5', 'C6', 'C7'],
'D': ['D4', 'D5', 'D6', 'D7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2], sort=False)
print(result)
Output:
Example 3: Ignoring the Index During Concatenation
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[4, 5, 6, 7])
result = pd.concat([df1, df2], ignore_index=True)
print(result)
Output:
Example 4: Adding Multi-level Index on Concatenation
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
})
result = pd.concat([df1, df2], keys=['df1', 'df2'])
print(result)
Output:
Example 5: Concatenation with Different Indexes
import pandas as pd
# Create two DataFrames with different indexes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[10, 11, 12, 13])
result = pd.concat([df1, df2])
print(result)
Output:
Example 6: Using append()
for Vertical Concatenation
While pd.concat()
is the more versatile function, pandas also offers a simpler append()
method for DataFrame objects. This method is specifically designed for concatenating along the rows (i.e., vertically).
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
})
result = df1._append(df2, ignore_index=True)
print(result)
Output:
Example 7: Handling Duplicate Indexes
When concatenating DataFrames with duplicate indexes, pandas will retain all index values, potentially resulting in duplicate index values in the resulting DataFrame. This might not be a problem, but if a unique index is required, additional steps such as resetting the index may be necessary.
import pandas as pd
# Create two DataFrames with duplicate indexes
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
}, index=[0, 1, 2, 3])
df2 = pd.DataFrame({
'A': ['A4', 'A5', 'A6', 'A7'],
'B': ['B4', 'B5', 'B6', 'B7']
}, index=[2, 3, 4, 5])
result = pd.concat([df1, df2])
print(result)
Output:
Example 8: Concatenating with Mixed Data Types
When concatenating DataFrames containing columns with different data types, pandas will automatically convert data types if necessary (known as upcasting).
import pandas as pd
# Create two Dataframes with mixed data types
df1 = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [1.1, 2.2, 3.3, 4.4]
})
df2 = pd.DataFrame({
'A': ['five', 'six', 'seven', 'eight'],
'B': [5.5, 6.6, 7.7, 8.8]
})
result = pd.concat([df1, df2], ignore_index=True)
print(result)
Output:
Example 9: Concatenating with Data Alignment
When the DataFrames being concatenated do not have the same set of columns, pandas will align columns by name and introduce NaN values for missing data in any of the DataFrames.
import pandas as pd
# Create two DataFrames with different columns
df1 = pd.DataFrame({
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']
})
df2 = pd.DataFrame({
'B': ['B4', 'B5', 'B6', 'B7'],
'C': ['C4', 'C5', 'C6', 'C7']
})
result = pd.concat([df1, df2], sort=False)
print(result)
Output:
Example 10: Using Concatenation with Real-World Data
In real-world scenarios, data often comes in parts from different sources. Concatenating these parts vertically can help in creating a unified dataset for analysis.
import pandas as pd
# Simulate loading data from different sources
data_part1 = pd.DataFrame({
'CustomerID': [1, 2, 3, 4],
'Product': ['Product1', 'Product2', 'Product3', 'Product4']
})
data_part2 = pd.DataFrame({
'CustomerID': [5, 6, 7, 8],
'Product': ['Product5', 'Product6', 'Product7', 'Product8']
})
combined_data = pd.concat([data_part1, data_part2])
print(combined_data)
Output:
Example 11: Concatenation with Category Data Type
When dealing with categorical data, it’s important to ensure that the category types are consistent across DataFrames to avoid data type conflicts during concatenation.
import pandas as pd
# Create categorical data
df1 = pd.DataFrame({
'Grade': pd.Categorical(['A', 'B', 'C', 'D'])
})
df2 = pd.DataFrame({
'Grade': pd.Categorical(['E', 'F'])
})
result = pd.concat([df1, df2])
print(result)
Output:
Example 12: Concatenation and Memory Usage
Concatenation can significantly increase memory usage, especially with large DataFrames. It’s important to monitor memory usage during these operations.
import pandas as pd
# Create large DataFrames
df1 = pd.DataFrame({
'Data': range(100000)
})
df2 = pd.DataFrame({
'Data': range(100000, 200000)
})
result = pd.concat([df1, df2])
print(result.memory_usage())
Output:
Example 13: Concatenation with Date Ranges
Concatenating DataFrames that include date ranges can be useful for time series analysis.
import pandas as pd
# Create DataFrames with date ranges
date_range1 = pd.date_range('2023-01-01', periods=10, freq='D')
date_range2 = pd.date_range('2023-02-01', periods=10, freq='D')
df1 = pd.DataFrame({
'Date': date_range1,
'Value': range(10)
})
df2 = pd.DataFrame({
'Date': date_range2,
'Value': range(10, 20)
})
result = pd.concat([df1, df2])
print(result)
Output:
Example 14: Concatenation with Different Languages
Handling DataFrames with text data in different languages can be challenging. Ensuring proper encoding and alignment is crucial.
import pandas as pd
# Create DataFrames with text in different languages
df1 = pd.DataFrame({
'Text': ['Hello', 'World']
})
df2 = pd.DataFrame({
'Text': ['こんにちは', '世界']
})
result = pd.concat([df1, df2])
print(result)
Output:
Example 15: Advanced Concatenation with Custom Functions
Sometimes, you might need to apply custom functions to data during concatenation. This can be achieved using the apply()
method after concatenation.
import pandas as pd
# Define a custom function to modify data
def add_suffix(series):
return series.apply(lambda x: f"{x}_suffix")
# Create DataFrames
df1 = pd.DataFrame({
'Data': ['A', 'B', 'C']
})
df2 = pd.DataFrame({
'Data': ['D', 'E', 'F']
})
result = pd.concat([df1, df2]).apply(add_suffix)
print(result)
Output:
Pandas Concat Two DataFrames Vertically Conclusion
Concatenating DataFrames vertically is a fundamental operation in data manipulation with pandas. It allows for the integration of data from multiple sources into a single DataFrame, facilitating easier analysis and manipulation. Understanding how to effectively use pd.concat()
and related functions is essential for any data scientist or analyst working with Python and pandas.