Pandas Apply Lambda on Multiple Columns
Pandas is a powerful data manipulation library in Python that offers extensive functionality for data analysis. One of the most versatile features of Pandas is the apply()
function, which allows you to apply a function along an axis of the DataFrame. When combined with lambda functions, apply()
becomes even more powerful, enabling concise and efficient data manipulation. This article explores how to use apply()
with lambda functions across multiple columns of a DataFrame.
Introduction to Pandas apply()
and Lambda Functions
The apply()
function in Pandas can be used to apply a function along the axis of a DataFrame (rows or columns). A lambda function is a small anonymous function defined with the keyword lambda
. Lambda functions can take any number of arguments but can only have one expression. They are perfect for short, throwaway functions that are not needed elsewhere in your code.
Using apply()
with lambda functions can help you perform complex operations across DataFrame columns efficiently. This approach is particularly useful when you need to manipulate multiple columns to create a new column or modify existing columns.
Basic Usage of apply()
with Lambda Functions
Let’s start with a simple example where we use apply()
and a lambda function to add two columns of a DataFrame.
Example 1: Adding Two Columns
import pandas as pd
# Create a DataFrame
data = {
'A': [1, 2, 3],
'B': [4, 5, 6]
}
df = pd.DataFrame(data)
# Use apply() with a lambda function to add two columns
df['C'] = df.apply(lambda row: row['A'] + row['B'], axis=1)
print(df)
Output:
Applying Functions to Multiple Columns
You can apply a function to multiple columns and create a new column based on the result. This is useful for more complex calculations.
Example 2: Calculating the Average of Multiple Columns
import pandas as pd
# Create a DataFrame
data = {
'A': [10, 20, 30],
'B': [40, 50, 60],
'C': [70, 80, 90]
}
df = pd.DataFrame(data)
# Calculate the average of three columns
df['Average'] = df.apply(lambda row: (row['A'] + row['B'] + row['C']) / 3, axis=1)
print(df)
Output:
Conditional Logic Across Multiple Columns
Lambda functions with apply()
can also be used to apply conditional logic across multiple columns.
Example 3: Apply Conditional Logic
import pandas as pd
# Create a DataFrame
data = {
'A': [10, 20, 30],
'B': [20, 15, 30]
}
df = pd.DataFrame(data)
# Use conditional logic to create a new column
df['Max'] = df.apply(lambda row: max(row['A'], row['B']), axis=1)
print(df)
Output:
More Complex Operations
You can perform more complex operations such as string manipulations and date calculations using apply()
with lambda functions across multiple columns.
Example 4: Concatenating Strings from Multiple Columns
import pandas as pd
# Create a DataFrame
data = {
'First Name': ['John', 'Jane', 'Alice'],
'Last Name': ['Doe', 'Smith', 'Johnson']
}
df = pd.DataFrame(data)
# Concatenate first and last name
df['Full Name'] = df.apply(lambda row: row['First Name'] + " " + row['Last Name'], axis=1)
print(df)
Output:
Example 5: Calculating Duration Between Dates
import pandas as pd
# Create a DataFrame
data = {
'Start Date': pd.to_datetime(['2021-01-01', '2021-06-15', '2021-09-10']),
'End Date': pd.to_datetime(['2021-01-05', '2021-06-20', '2021-09-15'])
}
df = pd.DataFrame(data)
# Calculate the duration in days between two dates
df['Duration'] = df.apply(lambda row: (row['End Date'] - row['Start Date']).days, axis=1)
print(df)
Output:
Advanced Use Cases
As you become more comfortable with using apply()
and lambda functions, you can tackle more advanced data manipulation tasks.
Example 6: Applying a Complex Mathematical Formula
import pandas as pd
# Create a DataFrame
data = {
'x': [1, 2, 3],
'y': [4, 5, 6],
'z': [7, 8, 9]
}
df = pd.DataFrame(data)
# Apply a complex mathematical formula
df['w'] = df.apply(lambda row: (row['x']**2 + row['y']**2 + row['z']**2)**0.5, axis=1)
print(df)
Output:
Example 7: Filtering Rows Based on Multiple Column Criteria
import pandas as pd
# Create a DataFrame
data = {
'Age': [25, 30, 35],
'Salary': [50000, 60000, 70000],
'Years of Experience': [2, 5, 8]
}
df = pd.DataFrame(data)
# Filter rows where age is greater than 30 and years of experience is at least 5
filtered_df = df.apply(lambda row: row if row['Age'] > 30 and row['Years of Experience'] >= 5 else None, axis=1)
print(filtered_df.dropna())
Output:
Pandas Apply Lambda on Multiple Columns Conclusion
Using apply()
with lambda functions in Pandas is a powerful way to perform data manipulation across multiple columns. This technique allows for concise, readable code and can handle a wide range of data manipulation tasks, from simple arithmetic to complex conditional logic and more. As you explore data analysis with Pandas, integrating apply()
with lambda functions will undoubtedly be a valuable skill in your toolkit.