Pandas Apply Lambda on Multiple Columns

Pandas Apply Lambda on Multiple Columns

Pandas is a powerful data manipulation library in Python that offers extensive functionality for data analysis. One of the most versatile features of Pandas is the apply() function, which allows you to apply a function along an axis of the DataFrame. When combined with lambda functions, apply() becomes even more powerful, enabling concise and efficient data manipulation. This article explores how to use apply() with lambda functions across multiple columns of a DataFrame.

Introduction to Pandas apply() and Lambda Functions

The apply() function in Pandas can be used to apply a function along the axis of a DataFrame (rows or columns). A lambda function is a small anonymous function defined with the keyword lambda. Lambda functions can take any number of arguments but can only have one expression. They are perfect for short, throwaway functions that are not needed elsewhere in your code.

Using apply() with lambda functions can help you perform complex operations across DataFrame columns efficiently. This approach is particularly useful when you need to manipulate multiple columns to create a new column or modify existing columns.

Basic Usage of apply() with Lambda Functions

Let’s start with a simple example where we use apply() and a lambda function to add two columns of a DataFrame.

Example 1: Adding Two Columns

import pandas as pd

# Create a DataFrame
data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6]
}
df = pd.DataFrame(data)

# Use apply() with a lambda function to add two columns
df['C'] = df.apply(lambda row: row['A'] + row['B'], axis=1)
print(df)

Output:

Pandas Apply Lambda on Multiple Columns

Applying Functions to Multiple Columns

You can apply a function to multiple columns and create a new column based on the result. This is useful for more complex calculations.

Example 2: Calculating the Average of Multiple Columns

import pandas as pd

# Create a DataFrame
data = {
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
}
df = pd.DataFrame(data)

# Calculate the average of three columns
df['Average'] = df.apply(lambda row: (row['A'] + row['B'] + row['C']) / 3, axis=1)
print(df)

Output:

Pandas Apply Lambda on Multiple Columns

Conditional Logic Across Multiple Columns

Lambda functions with apply() can also be used to apply conditional logic across multiple columns.

Example 3: Apply Conditional Logic

import pandas as pd

# Create a DataFrame
data = {
    'A': [10, 20, 30],
    'B': [20, 15, 30]
}
df = pd.DataFrame(data)

# Use conditional logic to create a new column
df['Max'] = df.apply(lambda row: max(row['A'], row['B']), axis=1)
print(df)

Output:

Pandas Apply Lambda on Multiple Columns

More Complex Operations

You can perform more complex operations such as string manipulations and date calculations using apply() with lambda functions across multiple columns.

Example 4: Concatenating Strings from Multiple Columns

import pandas as pd

# Create a DataFrame
data = {
    'First Name': ['John', 'Jane', 'Alice'],
    'Last Name': ['Doe', 'Smith', 'Johnson']
}
df = pd.DataFrame(data)

# Concatenate first and last name
df['Full Name'] = df.apply(lambda row: row['First Name'] + " " + row['Last Name'], axis=1)
print(df)

Output:

Pandas Apply Lambda on Multiple Columns

Example 5: Calculating Duration Between Dates

import pandas as pd

# Create a DataFrame
data = {
    'Start Date': pd.to_datetime(['2021-01-01', '2021-06-15', '2021-09-10']),
    'End Date': pd.to_datetime(['2021-01-05', '2021-06-20', '2021-09-15'])
}
df = pd.DataFrame(data)

# Calculate the duration in days between two dates
df['Duration'] = df.apply(lambda row: (row['End Date'] - row['Start Date']).days, axis=1)
print(df)

Output:

Pandas Apply Lambda on Multiple Columns

Advanced Use Cases

As you become more comfortable with using apply() and lambda functions, you can tackle more advanced data manipulation tasks.

Example 6: Applying a Complex Mathematical Formula

import pandas as pd

# Create a DataFrame
data = {
    'x': [1, 2, 3],
    'y': [4, 5, 6],
    'z': [7, 8, 9]
}
df = pd.DataFrame(data)

# Apply a complex mathematical formula
df['w'] = df.apply(lambda row: (row['x']**2 + row['y']**2 + row['z']**2)**0.5, axis=1)
print(df)

Output:

Pandas Apply Lambda on Multiple Columns

Example 7: Filtering Rows Based on Multiple Column Criteria

import pandas as pd

# Create a DataFrame
data = {
    'Age': [25, 30, 35],
    'Salary': [50000, 60000, 70000],
    'Years of Experience': [2, 5, 8]
}
df = pd.DataFrame(data)

# Filter rows where age is greater than 30 and years of experience is at least 5
filtered_df = df.apply(lambda row: row if row['Age'] > 30 and row['Years of Experience'] >= 5 else None, axis=1)
print(filtered_df.dropna())

Output:

Pandas Apply Lambda on Multiple Columns

Pandas Apply Lambda on Multiple Columns Conclusion

Using apply() with lambda functions in Pandas is a powerful way to perform data manipulation across multiple columns. This technique allows for concise, readable code and can handle a wide range of data manipulation tasks, from simple arithmetic to complex conditional logic and more. As you explore data analysis with Pandas, integrating apply() with lambda functions will undoubtedly be a valuable skill in your toolkit.