Pandas apply function to multiple columns

Pandas apply function to multiple columns

Pandas is a powerful tool for data manipulation and analysis in Python. One of its core functionalities is the ability to apply functions across multiple columns of a DataFrame. This capability is particularly useful when you need to perform complex computations or transformations on your data. In this article, we will explore how to use the apply function in Pandas to apply a function to multiple columns of a DataFrame. We will provide detailed examples to illustrate various use cases and techniques.

Introduction to the Pandas Apply Function

The apply function in Pandas allows you to apply a function along an axis of the DataFrame (rows or columns). This function is highly versatile and can be used to apply both simple and complex functions to data. Before diving into applying functions to multiple columns, let’s briefly review how to use the apply function on a single column.

Example 1: Applying a Simple Function to a Single Column

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Define a simple function to add 10 to an age
def add_ten(age):
    return age + 10

# Apply the function to the 'Age' column
df['Age_plus_Ten'] = df['Age'].apply(add_ten)
print(df)

Output:

Pandas apply function to multiple columns

Now that we understand the basics, let’s explore how to apply functions to multiple columns.

Applying Functions to Multiple Columns

To apply a function to multiple columns, you can pass a list of column names to the apply function. You can also use lambda functions for more complex operations.

Example 2: Applying a Function to Multiple Columns

import pandas as pd

# Create a DataFrame
data = {'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}
df = pd.DataFrame(data)

# Define a function to sum two numbers
def sum_two_columns(x):
    return x['A'] + x['B']

# Apply the function to columns 'A' and 'B'
df['A_plus_B'] = df.apply(lambda row: sum_two_columns(row), axis=1)
print(df)

Output:

Pandas apply function to multiple columns

Example 3: Using Lambda Functions Directly

import pandas as pd

# Create a DataFrame
data = {'A': [10, 20, 30], 'B': [40, 50, 60]}
df = pd.DataFrame(data)

# Apply a lambda function to add two columns
df['Sum'] = df.apply(lambda row: row['A'] + row['B'], axis=1)
print(df)

Output:

Pandas apply function to multiple columns

Example 4: Applying a Function to Multiple Columns to Create a New Column

import pandas as pd

# Create a DataFrame
data = {'Temperature_C': [22, 25, 28], 'Wind_Speed_kmh': [15, 18, 20]}
df = pd.DataFrame(data)

# Define a function to calculate the wind chill index
def wind_chill(temp, wind):
    return 13.12 + 0.6215 * temp - 11.37 * (wind ** 0.16) + 0.3965 * temp * (wind ** 0.16)

# Apply the function to the DataFrame
df['Wind_Chill'] = df.apply(lambda row: wind_chill(row['Temperature_C'], row['Wind_Speed_kmh']), axis=1)
print(df)

Output:

Pandas apply function to multiple columns

Example 5: Applying a Function That Uses Multiple Columns to Filter Data

import pandas as pd

# Create a DataFrame
data = {'Product': ['Widget', 'Gadget', 'Doodad'], 'Price': [25, 20, 15], 'Quantity': [5, 10, 15]}
df = pd.DataFrame(data)

# Define a function to calculate total sales
def total_sales(price, quantity):
    return price * quantity

# Apply the function and filter products with sales over 100
df['Total_Sales'] = df.apply(lambda row: total_sales(row['Price'], row['Quantity']), axis=1)
filtered_df = df[df['Total_Sales'] > 100]
print(filtered_df)

Output:

Pandas apply function to multiple columns

Example 6: Complex Operations Using Multiple Columns

import pandas as pd

# Create a DataFrame
data = {'Height_cm': [170, 180, 190], 'Weight_kg': [70, 80, 90]}
df = pd.DataFrame(data)

# Define a function to calculate BMI
def calculate_bmi(height, weight):
    return (weight / ((height / 100) ** 2))

# Apply the function to calculate BMI
df['BMI'] = df.apply(lambda row: calculate_bmi(row['Height_cm'], row['Weight_kg']), axis=1)
print(df)

Output:

Pandas apply function to multiple columns

Example 7: Applying Functions to Transform Data Based on Multiple Columns

import pandas as pd

# Create a DataFrame
data = {'First_Name': ['John', 'Jane', 'Jim'], 'Last_Name': ['Doe', 'Doe', 'Beam']}
df = pd.DataFrame(data)

# Define a function to create a full name
def full_name(first, last):
    return f"{first} {last}"

# Apply the function to create a full name
df['Full_Name'] = df.apply(lambda row: full_name(row['First_Name'], row['Last_Name']), axis=1)
print(df)

Output:

Pandas apply function to multiple columns

Example 8: Using Apply with Multiple Columns for Data Normalization

import pandas as pd

# Create a DataFrame
data = {'Scores': [200, 300, 400], 'Max_Score': [500, 500, 500]}
df = pd.DataFrame(data)

# Define a function to normalize scores
def normalize(score, max_score):
    return (score / max_score) * 100

# Apply the function to normalize scores
df['Normalized_Score'] = df.apply(lambda row: normalize(row['Scores'], row['Max_Score']), axis=1)
print(df)

Output:

Pandas apply function to multiple columns

Example 9: Applying a Function to Multiple Columns to Handle Missing Data

import pandas as pd

# Create a DataFrame with missing values
data = {'A': [1, None, 3], 'B': [4, 5, None]}
df = pd.DataFrame(data)

# Define a function to replace missing values with the column mean
def replace_missing(x):
    return x.fillna(x.mean())

# Apply the function to the DataFrame
df = df.apply(replace_missing)
print(df)

Output:

Pandas apply function to multiple columns

Example 10: Using Apply to Aggregate Data Across Multiple Columns

import pandas as pd

# Create a DataFrame
data = {'Group': ['A', 'A', 'B', 'B', 'C'], 'Data': [1, 2, 3, 4, 5]}
df = pd.DataFrame(data)

# Define a function to calculate the sum of data for each group
def group_sum(group):
    return group.sum()

# Group by 'Group' and apply the sum function
grouped_df = df.groupby('Group')['Data'].apply(group_sum)
print(grouped_df)

Output:

Pandas apply function to multiple columns

Pandas apply function to multiple columns Conclusion

In this article, we explored various ways to apply functions to multiple columns in a Pandas DataFrame using the apply function. We covered a range of examples from simple arithmetic operations to more complex functions involving data transformation and normalization. The apply function is a powerful tool that provides flexibility in handling data, allowing for both row-wise and column-wise operations. By mastering this functionality, you can efficiently manipulate and analyze data in Python using Pandas.