Pandas apply function to each row
Pandas is a powerful Python library used for data manipulation and analysis. One of its core functionalities is the ability to apply functions to each row of a DataFrame. This feature is incredibly useful for data transformation, cleaning, and preparation tasks. In this article, we will explore how to use the apply
function in various scenarios to manipulate DataFrame rows. We will provide detailed examples of code that can be run independently to demonstrate the versatility of the apply
function.
Introduction to the Apply Function
The apply
function in Pandas allows you to apply a function along an axis of the DataFrame (rows or columns). When applying a function to each row, you set the axis
parameter to 1. The function you apply can be any callable that takes a DataFrame or Series and returns a DataFrame, Series, or a scalar. This makes apply
highly flexible and powerful.
Example 1: Basic Usage of Apply on Rows
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Email': ['[email protected]', '[email protected]', '[email protected]']}
df = pd.DataFrame(data)
# Define a simple function to add a prefix to the email
def add_prefix(row):
return "User_" + row['Email']
# Apply the function to each row
df['Email'] = df.apply(add_prefix, axis=1)
print(df)
Output:
Example 2: Using Apply with a Lambda Function
import pandas as pd
# Create a DataFrame
data = {'Product': ['Widget', 'Gadget', 'Doodad'],
'Price': [25.00, 35.50, 12.75],
'Website': ['www.widgetcorp.com', 'www.gadgetpro.com', 'www.doodadsplace.com']}
df = pd.DataFrame(data)
# Use a lambda function to add a suffix to the product name
df['Product'] = df.apply(lambda row: row['Product'] + "_pandasdataframe.com", axis=1)
print(df)
Output:
Example 3: Conditional Operations Using Apply
import pandas as pd
# Create a DataFrame
data = {'Employee': ['John Doe', 'Jane Smith', 'Anne Johnson'],
'Salary': [90000, 110000, 75000],
'Department': ['Sales', 'Marketing', 'HR']}
df = pd.DataFrame(data)
# Define a function to apply a raise based on department
def apply_raise(row):
if row['Department'] == 'Sales':
return row['Salary'] * 1.10
else:
return row['Salary']
# Apply the function to each row
df['Salary'] = df.apply(apply_raise, axis=1)
print(df)
Output:
Example 4: Modifying Multiple Columns Using Apply
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Email': ['[email protected]', '[email protected]', '[email protected]']}
df = pd.DataFrame(data)
# Define a function to increment age and modify email
def modify_info(row):
row['Age'] += 1
row['Email'] = "Updated_" + row['Email']
return row
# Apply the function to each row
df = df.apply(modify_info, axis=1)
print(df)
Output:
Example 5: Using Apply to Generate a New Column
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'Country': ['USA', 'Canada', 'UK']}
df = pd.DataFrame(data)
# Define a function to determine eligibility based on age
def check_eligibility(row):
return "Eligible" if row['Age'] > 28 else "Not Eligible"
# Apply the function to create a new column
df['Status'] = df.apply(check_eligibility, axis=1)
print(df)
Output:
Example 6: Complex Operations Involving Multiple Columns
import pandas as pd
# Create a DataFrame
data = {'First Name': ['Alice', 'Bob', 'Charlie'],
'Last Name': ['Smith', 'Jones', 'Brown'],
'Domain': ['@pandasdataframe.com', '@pandasdataframe.com', '@pandasdataframe.com']}
df = pd.DataFrame(data)
# Define a function to create a full email address
def create_email(row):
return row['First Name'].lower() + "." + row['Last Name'].lower() + row['Domain']
# Apply the function to each row
df['Email'] = df.apply(create_email, axis=1)
print(df)
Output:
Example 7: Applying Functions that Return Multiple Values
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35]}
df = pd.DataFrame(data)
# Define a function to return multiple new values
def process_row(row):
age_next_year = row['Age'] + 1
age_in_ten_years = row['Age'] + 10
return pd.Series([age_next_year, age_in_ten_years], index=['Age Next Year', 'Age in Ten Years'])
# Apply the function to each row
df[['Age Next Year', 'Age in Ten Years']] = df.apply(process_row, axis=1)
print(df)
Output:
Example 8: Error Handling in Apply Functions
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Birth Year': [1985, 'unknown', 1990]}
df = pd.DataFrame(data)
# Define a function to calculate age
def calculate_age(row):
try:
return 2023 - int(row['Birth Year'])
except ValueError:
return 'Unknown'
# Apply the function to each row
df['Age'] = df.apply(calculate_age, axis=1)
print(df)
Output:
Example 9: Using External Parameters in Apply Functions
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Sales': [300, 450, 500]}
df = pd.DataFrame(data)
# Define a function to apply a custom multiplier
def apply_custom_multiplier(row, multiplier):
return row['Sales'] * multiplier
# Apply the function with an external multiplier
multiplier = 1.1
df['Adjusted Sales'] = df.apply(apply_custom_multiplier, axis=1, args=(multiplier,))
print(df)
Output:
Example 10: Vectorized Operations vs Apply
import pandas as pd
# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
'Score': [88, 92, 85]}
df = pd.DataFrame(data)
# Define a function to add points
def add_points(row):
return row['Score'] + 5
# Apply the function to each row
df['New Score'] = df.apply(add_points, axis=1)
print(df)
# Alternatively, use vectorized operations for better performance
df['New Score Vectorized'] = df['Score'] + 5
print(df)
Output:
Pandas apply function to each row Conclusion
The apply
function is a versatile tool in Pandas that allows you to perform complex row-wise transformations. Whether you’re modifying a single column, creating new columns, or performing error handling, apply
provides a powerful way to process your data efficiently. By understanding and utilizing this function, you can significantly enhance your data manipulation capabilities in Python.