Pandas Apply Multiple Columns

Pandas Apply Multiple Columns

Pandas is a powerful Python library used for data manipulation and analysis. One of its core functionalities is the ability to apply functions across multiple columns of a DataFrame. This capability is particularly useful when you need to perform operations that depend on multiple data points within a row. In this article, we will explore various ways to use the apply() function on multiple columns, providing detailed examples and explanations.

Introduction to Pandas apply()

The apply() function in Pandas allows you to apply a function along an axis of the DataFrame (rows or columns). When working with multiple columns, you typically use apply() to apply a function across rows, where each row combines data from several columns.

Basic Usage of apply()

Before diving into complex examples, let’s start with a basic example of using apply() on a single column to understand its syntax and behavior.

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Define a simple function to add a constant to age
def add_five(x):
    return x + 5

# Apply the function
df['Age_plus_five'] = df['Age'].apply(add_five)
print(df)

Output:

Pandas Apply Multiple Columns

Applying Functions to Multiple Columns

To apply functions to multiple columns, you need to pass a function that accepts a Series object (a row or a column, depending on the axis) and operates on the required columns.

Example 1: Calculate the Sum of Two Columns

import pandas as pd

# Create a DataFrame
data = {'A': [1, 2, 3],
        'B': [4, 5, 6]}
df = pd.DataFrame(data)

# Define a function that sums two columns
def sum_columns(row):
    return row['A'] + row['B']

# Apply the function across rows
df['A_plus_B'] = df.apply(sum_columns, axis=1)
print(df)

Output:

Pandas Apply Multiple Columns

Example 2: Combining Text from Multiple Columns

import pandas as pd

# Create a DataFrame
data = {'First': ['John', 'Jane', 'Jim'],
        'Last': ['Doe', 'Doe', 'Beam']}
df = pd.DataFrame(data)

# Define a function to combine first and last names
def full_name(row):
    return f"{row['First']} {row['Last']}"

# Apply the function
df['Full_Name'] = df.apply(full_name, axis=1)
print(df)

Output:

Pandas Apply Multiple Columns

Example 3: Conditional Operations Across Columns

import pandas as pd

# Create a DataFrame
data = {'Temperature': [22, 28, 15],
        'Humidity': [80, 60, 78]}
df = pd.DataFrame(data)

# Define a function to check comfort level
def comfort_level(row):
    if row['Temperature'] > 25 and row['Humidity'] < 65:
        return 'Comfortable'
    else:
        return 'Uncomfortable'

# Apply the function
df['Comfort'] = df.apply(comfort_level, axis=1)
print(df)

Output:

Pandas Apply Multiple Columns

Example 4: Applying a Lambda Function

Lambda functions are anonymous functions defined with the lambda keyword. They are handy for simple operations that can be expressed in a single statement.

import pandas as pd

# Create a DataFrame
data = {'X': [1, 2, 3],
        'Y': [4, 5, 6]}
df = pd.DataFrame(data)

# Apply a lambda function to sum columns
df['X_plus_Y'] = df.apply(lambda row: row['X'] + row['Y'], axis=1)
print(df)

Output:

Pandas Apply Multiple Columns

Example 5: Using Multiple Functions with apply()

You can also pass multiple functions to apply() to perform different operations on different columns simultaneously.

import pandas as pd

# Create a DataFrame
data = {'Income': [50000, 60000, 70000],
        'Tax_Rate': [0.2, 0.25, 0.3]}
df = pd.DataFrame(data)

# Define a function to calculate after-tax income
def after_tax_income(row):
    return row['Income'] * (1 - row['Tax_Rate'])

# Apply the function
df['After_Tax_Income'] = df.apply(after_tax_income, axis=1)
print(df)

Output:

Pandas Apply Multiple Columns

Example 6: Complex Calculations

Sometimes, you might need to perform more complex calculations that involve multiple steps or conditions.

import pandas as pd

# Create a DataFrame
data = {'Base': [100, 200, 300],
        'Bonus': [50, 60, 70]}
df = pd.DataFrame(data)

# Define a complex function to calculate total compensation
def total_compensation(row):
    base = row['Base']
    bonus = row['Bonus']
    if base > 250:
        return base + bonus + 100  # Additional bonus for high base
    return base + bonus

# Apply the function
df['Total_Compensation'] = df.apply(total_compensation, axis=1)
print(df)

Output:

Pandas Apply Multiple Columns

Example 7: Applying Functions that Return Multiple Values

In some cases, you might want your function to return multiple new columns. This can be achieved by returning a Series from the function.

import pandas as pd

# Create a DataFrame
data = {'Height_cm': [170, 180, 190],
        'Weight_kg': [70, 80, 90]}
df = pd.DataFrame(data)

# Define a function to calculate BMI and categorize weight
def bmi_and_category(row):
    height_m = row['Height_cm'] / 100
    bmi = row['Weight_kg'] / (height_m ** 2)
    category = 'Normal' if 18.5 <= bmi < 25 else 'Abnormal'
    return pd.Series([bmi, category], index=['BMI', 'Category'])

# Apply the function
df[['BMI', 'Category']] = df.apply(bmi_and_category, axis=1)
print(df)

Output:

Pandas Apply Multiple Columns

Example 8: Using External Libraries in Functions

You can also use external libraries within your functions to extend the capabilities of your data manipulations.

import pandas as pd
import numpy as np

# Create a DataFrame
data = {'Angles': [0, 90, 180]}
df = pd.DataFrame(data)

# Define a function to calculate sine using numpy
def calculate_sine(row):
    return np.sin(np.radians(row['Angles']))

# Apply the function
df['Sine'] = df.apply(calculate_sine, axis=1)
print(df)

Output:

Pandas Apply Multiple Columns

Example 9: Performance Considerations

While apply() is very flexible, it might not always be the most efficient method for large datasets or complex operations. In such cases, consider using vectorized operations or other Pandas functionalities like agg() or transform().

import pandas as pd
import numpy as np

# Create a DataFrame
data = {'Values': [1, 2, 3]}
df = pd.DataFrame(data)

# Define a simple vectorized operation instead of using apply
df['Squared'] = df['Values'] ** 2
print(df)

Output:

Pandas Apply Multiple Columns

Pandas Apply Multiple Columns Conclusion

The apply() function in Pandas is a versatile tool for applying functions across multiple columns. It allows for both simple and complex data manipulations and can be combined with lambda functions, external libraries, and error handling to achieve robust data processing workflows. However, always consider the performance implications and explore optimized alternatives when dealing with large datasets or computationally intensive tasks.