Pandas DataFrame Append

Pandas DataFrame Append

Pandas is a powerful data manipulation library in Python that provides data structures and functions for effectively manipulating structured data. One of the key data structures in Pandas is the DataFrame, which can be thought of as a table or a spreadsheet. In this article, we will explore how to append data to a DataFrame using various methods and scenarios.

Appending data to a DataFrame is a common operation in data analysis and manipulation tasks. It involves adding new rows or columns to an existing DataFrame, thereby expanding the dataset. Pandas provides several ways to perform this operation, each suited to different scenarios and requirements.

1. Appending Rows to a DataFrame

One of the most common operations is appending rows to a DataFrame. This can be done using the append() method, which allows you to add one or more rows to the DataFrame.

Example 1: Appending a Single Row Using a Dictionary

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [25, 30]
})

# Create a dictionary representing a new row
new_row = {'Name': 'Charlie', 'Website': 'pandasdataframe.com', 'Age': 35}

# Append the row to the DataFrame
df = df._append(new_row, ignore_index=True)
print(df)

Output:

Pandas DataFrame Append

Example 2: Appending Multiple Rows Using a List of Dictionaries

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [25, 30]
})

# Create a list of dictionaries representing new rows
new_rows = [
    {'Name': 'Charlie', 'Website': 'pandasdataframe.com', 'Age': 35},
    {'Name': 'David', 'Website': 'pandasdataframe.com', 'Age': 40}
]

# Append the rows to the DataFrame
df = df._append(new_rows, ignore_index=True)
print(df)

Output:

Pandas DataFrame Append

2. Appending DataFrames

Another common scenario is appending one DataFrame to another. This is useful when you have data split across multiple DataFrames and you want to combine them into a single DataFrame.

Example 3: Appending Two DataFrames

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Name': ['Charlie', 'David'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [35, 40]
})

# Append df2 to df1
df = df1._append(df2, ignore_index=True)
print(df)

Output:

Pandas DataFrame Append

Example 4: Appending Multiple DataFrames Using Concat

import pandas as pd

# Create three DataFrames
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Name': ['Charlie', 'David'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [35, 40]
})

df3 = pd.DataFrame({
    'Name': ['Eve', 'Frank'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [45, 50]
})

# Use concat to append all DataFrames
df = pd.concat([df1, df2, df3], ignore_index=True)
print(df)

Output:

Pandas DataFrame Append

3. Appending Columns to a DataFrame

In addition to appending rows, you might also need to append columns to a DataFrame. This can be done by simply assigning new columns to the DataFrame.

Example 5: Appending a Single Column

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30]
})

# Append a new column
df['Website'] = ['pandasdataframe.com', 'pandasdataframe.com']
print(df)

Output:

Pandas DataFrame Append

Example 6: Appending Multiple Columns Using a DataFrame

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Age': [25, 30]
})

# Create another DataFrame with new columns
new_columns = pd.DataFrame({
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Score': [88, 92]
})

# Append new columns to the original DataFrame
df = pd.concat([df, new_columns], axis=1)
print(df)

Output:

Pandas DataFrame Append

4. Handling Indexes When Appending

When appending data, it’s important to manage the indexes properly to avoid issues with duplicate indexes. Pandas provides several options to handle indexes during the append operation.

Example 7: Resetting Index After Append

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Name': ['Charlie', 'David'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [35, 40]
})

# Append df2 to df1 and reset the index
df = df1._append(df2, ignore_index=True)
print(df)

Output:

Pandas DataFrame Append

Example 8: Using Concat with Sort

import pandas as pd

# Create three DataFrames
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Name': ['Charlie', 'David'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [35, 40]
})

df3 = pd.DataFrame({
    'Name': ['Eve', 'Frank'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [45, 50]
})

# Use concat to append all DataFrames and sort the index
df = pd.concat([df1, df2, df3], ignore_index=True, sort=True)
print(df)

Output:

Pandas DataFrame Append

5. Appending with Different Column Names

Sometimes, the DataFrames you want to append might not have the same column names. In such cases, you can use the rename method to align the column names before appending.

Example 9: Aligning Column Names Using Rename

import pandas as pd

# Create two DataFrames with different column names
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'PersonName': ['Charlie', 'David'],
    'Site': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Years': [35, 40]
})

# Rename columns in df2 to match df1
df2.rename(columns={'PersonName': 'Name', 'Site': 'Website', 'Years': 'Age'}, inplace=True)

# Append df2 to df1
df = df1._append(df2, ignore_index=True)
print(df)

Output:

Pandas DataFrame Append

6. Appending with Missing Columns

When appending DataFrames, you might encounter situations where one DataFrame has columns that the other does not. Pandas handles this gracefully by filling in missing columns with NaN values.

Example 10: Appending DataFrames with Missing Columns

import pandas as pd

# Create two DataFrames with different columns
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Name': ['Charlie', 'David'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Score': [88, 92]
})

# Append df2 to df1
df = df1._append(df2, ignore_index=True)
print(df)

Output:

Pandas DataFrame Append

In the resulting DataFrame, the ‘Age’ column for the rows from df2 will be filled with NaN, and the ‘Score’ column for the rows from df1 will be filled with NaN.

7. Appending with Different Data Types

Pandas also handles appending of columns with different data types. If a column in one DataFrame is of a different data type than the corresponding column in the other DataFrame, Pandas will try to convert the data type to a common type that can accommodate all values.

Example 11: Appending DataFrames with Different Data Types

import pandas as pd

# Create two DataFrames with different data types
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Name': ['Charlie', 'David'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': ['35', '40']
})

# Append df2 to df1
df = df1._append(df2, ignore_index=True)
print(df)

Output:

Pandas DataFrame Append

In the resulting DataFrame, the ‘Age’ column will be of object data type, as it can accommodate both integer and string values.

8. Appending with Duplicate Rows

When appending DataFrames, you might encounter situations where there are duplicate rows. By default, the append method does not remove duplicate rows. However, you can use the drop_duplicates method to remove duplicates after appending.

Example 12: Removing Duplicate Rows After Append

import pandas as pd

# Create two DataFrames with a duplicate row
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Name': ['Bob', 'Charlie'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [30, 35]
})

# Append df2 to df1 and remove duplicates
df = df1._append(df2, ignore_index=True).drop_duplicates()
print(df)

Output:

Pandas DataFrame Append

9. Appending with Different Indexes

If the DataFrames you are appending have different indexes, the append method will keep the original indexes by default. However, you can use the ignore_index parameter to reset the index.

Example 13: Appending DataFrames with Different Indexes

import pandas as pd

# Create two DataFrames with different indexes
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [25, 30]
}, index=[1, 2])

df2 = pd.DataFrame({
    'Name': ['Charlie', 'David'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [35, 40]
}, index=[3, 4])

# Append df2 to df1 and reset the index
df = df1._append(df2, ignore_index=True)
print(df)

Output:

Pandas DataFrame Append

10. Appending with Sort

By default, the append method does not sort the columns. If you want to sort the columns, you can use the sort parameter.

Example 14: Appending DataFrames with Sort

import pandas as pd

# Create two DataFrames with different column orders
df1 = pd.DataFrame({
    'Name': ['Alice', 'Bob'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Age': [35, 40],
    'Name': ['Charlie', 'David'],
    'Website': ['pandasdataframe.com', 'pandasdataframe.com']
})

# Append df2 to df1 and sort the columns
df = df1._append(df2, ignore_index=True, sort=True)
print(df)

Output:

Pandas DataFrame Append

In conclusion, the append method in Pandas is a versatile tool for adding rows or columns to a DataFrame. It provides a range of options to handle different scenarios and requirements, making it a powerful tool for data manipulation and analysis.