Pandas DataFrame Apply

Pandas DataFrame Apply

Pandas is a powerful data manipulation library in Python that provides several functions to perform complex data transformations and analysis. One of the most versatile functions in Pandas is apply(). This function allows you to apply a function along an axis of the DataFrame or to elements of DataFrame series. This article will explore the apply() function in detail, providing various examples to illustrate its use in different scenarios.

Introduction to apply()

The apply() function in Pandas can be used on a DataFrame or a Series. The basic syntax of the apply() function is:

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)
  • func: This is the function that you want to apply to the DataFrame.
  • axis: Axis along which the function is applied. 0 means applying function to each column, 1 means applying function to each row.
  • raw: Determines if rows or columns are passed as Series or ndarrays.
  • result_type: This can be expand, reduce, or broadcast to get the desired type of result.

Let’s dive into some examples to see how apply() can be used in different scenarios.

Example 1: Applying a Function to Each Column

In this example, we will apply a function that calculates the range (max – min) of each column in a DataFrame.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function to calculate range
def calc_range(x):
    return x.max() - x.min()

# Apply function to each column
result = df.apply(calc_range)
print(result)

Output:

Pandas DataFrame Apply

Example 2: Applying a Function to Each Row

Now, let’s apply a function to each row of the DataFrame. We will calculate the sum of values in each row.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function to calculate sum
def calc_sum(row):
    return row.sum()

# Apply function to each row
result = df.apply(calc_sum, axis=1)
print(result)

Output:

Pandas DataFrame Apply

Example 3: Using Lambda Functions

Lambda functions are anonymous functions defined using the lambda keyword. They are handy when you need to apply a simple function quickly.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Apply a lambda function to each column to multiply by 2
result = df.apply(lambda x: x * 2)
print(result)

Output:

Pandas DataFrame Apply

Example 4: Applying a Function that Returns Multiple Values

Sometimes, you might want to apply a function that returns multiple values. In this case, you can use result_type='expand'.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function that returns multiple values
def func(x):
    return pd.Series([x.min(), x.max()], index=['min', 'max'])

# Apply function to each column
result = df.apply(func, result_type='expand')
print(result)

Output:

Pandas DataFrame Apply

Example 5: Applying a Function with Additional Arguments

You can pass additional arguments to the function being applied using the args parameter.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function that uses additional arguments
def multiply(x, factor):
    return x * factor

# Apply function to each column with additional argument
result = df.apply(multiply, args=(10,))
print(result)

Output:

Pandas DataFrame Apply

Example 6: Using apply() with a Complex Function

apply() is not limited to simple arithmetic operations. You can use it to apply more complex functions, such as those involving conditional logic.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a complex function
def complex_function(x):
    if x['A'] > 1 and x['B'] < 6:
        return x['C'] * 2
    else:
        return x['C']

# Apply complex function to each row
result = df.apply(complex_function, axis=1)
print(result)

Output:

Pandas DataFrame Apply

Example 7: Applying a Function that Modifies the DataFrame In-Place

In some cases, you might want to modify the DataFrame directly within the function. This can be achieved by passing the DataFrame or a part of it by reference.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function that modifies the DataFrame
def modify_df(x):
    x['A'] = x['A'] * 2
    return x

# Apply function to each row
result = df.apply(modify_df, axis=1)
print(result)

Output:

Pandas DataFrame Apply

Example 8: Using apply() with a Function that Handles Missing Data

Handling missing data is a common task in data analysis. You can use apply() to apply a function that handles missing data in a specific way.

import pandas as pd
import numpy as np

# Create a DataFrame with missing values
df = pd.DataFrame({
    'A': [1, np.nan, 3],
    'B': [4, 5, np.nan],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function that handles missing data
def handle_missing(x):
    return x.fillna(0)

# Apply function to each column
result = df.apply(handle_missing)
print(result)

Output:

Pandas DataFrame Apply

Example 9: Applying a Function to Select Columns

You might want to apply a function only to specific columns of the DataFrame. This can be done by selecting the columns before applying the function.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function to calculate square
def square(x):
    return x ** 2

# Apply function to selected columns
result = df[['A', 'C']].apply(square)
print(result)

Output:

Pandas DataFrame Apply

Example 10: Combining apply() with Other Pandas Functions

apply() can be combined with other Pandas functions to perform more complex data manipulations. For example, you can use apply() along with groupby() to apply a function to each group separately.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Group': ['A', 'A', 'B', 'B', 'C', 'C'],
    'Value': [10, 20, 30, 40, 50, 60]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function to calculate the mean
def calc_mean(x):
    return x.mean()

# Group by 'Group' and apply function to each group
result = df.groupby('Group')['Value'].apply(calc_mean)
print(result)

Output:

Pandas DataFrame Apply

Example 11: Using apply() to Implement Conditional Logic

apply() can be used to implement more complex conditional logic across rows or columns. This is particularly useful for creating new columns based on conditions.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function with conditional logic
def check_value(x):
    if x['A'] > 1 and x['B'] < 6:
        return 'Condition met'
    else:
        return 'Condition not met'

# Apply function to each row
result = df.apply(check_value, axis=1)
print(result)

Output:

Pandas DataFrame Apply

Example 12: Applying a Function that Uses External Data

Sometimes, the function you want to apply might need to use external data. You can pass this data to the function using the args or **kwds parameters.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# External data
external_data = {'factor': 10}

# Define a function that uses external data
def use_external_data(x, data):
    return x * data['factor']

# Apply function to each column using external data
result = df.apply(use_external_data, args=(external_data,))
print(result)

Output:

Pandas DataFrame Apply

Example 13: Using apply() for Data Normalization

Data normalization is a common preprocessing step in data analysis. You can use apply() to normalize data in a DataFrame.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function for min-max normalization
def min_max_normalize(x):
    return (x - x.min()) / (x.max() - x.min())

# Apply normalization to each column
result = df.apply(min_max_normalize)
print(result)

Output:

Pandas DataFrame Apply

Example 14: Applying a Function to Update DataFrame Based on Another DataFrame

You can use apply() to update a DataFrame based on the values in another DataFrame. This is useful for merging or updating datasets based on certain conditions.

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

df2 = pd.DataFrame({
    'A': [100, 200, 300],
    'B': [400, 500, 600]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function to update df1 based on df2
def update_df(row, df2):
    if row.name in df2.index:
        return df2.loc[row.name]
    return row

# Apply function to update df1 based on df2
result = df1.apply(update_df, args=(df2,), axis=1)
print(result)

Output:

Pandas DataFrame Apply

Example 15: Using apply() with MultiIndex DataFrames

apply() can also be used with DataFrames that have a MultiIndex. This allows you to apply functions to subgroups of data.

import pandas as pd

# Create a MultiIndex DataFrame
arrays = [
    ['A', 'A', 'B', 'B'],
    ['one', 'two', 'one', 'two']
]
index = pd.MultiIndex.from_arrays(arrays, names=('Group', 'Subgroup'))
df = pd.DataFrame({
    'Data': [1, 2, 3, 4]
}, index=index)

# Define a function to increment data
def increment_data(x):
    return x + 10

# Apply function to each subgroup
result = df.groupby(level='Subgroup').apply(increment_data)
print(result)

Output:

Pandas DataFrame Apply

Example 16: Applying a Function with Error Handling

When applying functions, especially to large datasets, it’s important to handle errors gracefully. You can include error handling within the function you apply.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 'invalid', 3],
    'B': [4, 5, 'invalid']
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function with error handling
def safe_convert(x):
    try:
        return pd.to_numeric(x)
    except ValueError:
        return pd.NA

# Apply function with error handling to each element
result = df.applymap(safe_convert)
print(result)

Example 17: Using apply() to Aggregate Data

apply() can be used to aggregate data in a DataFrame. This is useful for summarizing or reducing data based on certain criteria.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [10, 20, 30, 40, 50],
    'B': [60, 70, 80, 90, 100],
    'C': [110, 120, 130, 140, 150]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function to calculate the sum
def calculate_sum(x):
    return x.sum()

# Apply function to aggregate data
result = df.apply(calculate_sum)
print(result)

Output:

Pandas DataFrame Apply

Example 18: Using apply() for Data Transformation

Data transformation is a common task in data preprocessing. You can use apply() to transform data according to specific rules or functions.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function to transform data
def transform_data(x):
    return x * 2 + 3

# Apply function to transform data
result = df.apply(transform_data)
print(result)

Output:

Pandas DataFrame Apply

Example 19: Using apply() to Perform Row-wise Operations

Sometimes, you may need to perform operations that consider an entire row at once. apply() can be used to perform row-wise operations by setting axis=1.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}, index=['pandasdataframe.com', 'pandasdataframe.com', 'pandasdataframe.com'])

# Define a function to calculate the product of a row
def product_row(row):
    return row.prod()

# Apply function to calculate the product of each row
result = df.apply(product_row, axis=1)
print(result)

Output:

Pandas DataFrame Apply

Example 20: Applying a Function to Modify Index

Modifying the index of a DataFrame can also be achieved using apply(). This can be useful for setting or resetting the index based on the DataFrame’s data.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({
    'Data': [100, 200, 300]
}, index=['first', 'second', 'third'])

# Define a function to modify index
def modify_index(x):
    return x.upper()

# Apply function to modify index
new_index = df.index.to_series().apply(modify_index)
df.index = new_index
print(df)

Output:

Pandas DataFrame Apply

Example 21: Using apply() to Merge DataFrames

apply() can be used to merge DataFrames based on a function that dictates the merging logic.

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({
    'Key': ['K0', 'K1', 'K2'],
    'A': ['A0', 'A1', 'A2']
})

df2 = pd.DataFrame({
    'Key': ['K0', 'K1', 'K2'],
    'B': ['B0', 'B1', 'B2']
})

# Define a function to merge rows based on the key
def merge_rows(x):
    row = df2[df2['Key'] == x['Key']]
    return pd.Series({
        'A': x['A'],
        'B': row.iloc[0]['B']
    })

# Apply function to merge DataFrames
result = df1.apply(merge_rows, axis=1)
print(result)

Output:

Pandas DataFrame Apply

These examples illustrate the versatility of the apply() function in pandas, which can be used for a wide range of data manipulation tasks, from simple transformations to complex data merging and filtering operations.