Pandas apply function to each row

Pandas apply function to each row

Pandas is a powerful Python library used for data manipulation and analysis. One of its core functionalities is the ability to apply functions to each row of a DataFrame. This feature is incredibly useful for data transformation, cleaning, and preparation tasks. In this article, we will explore how to use the apply function in various scenarios to manipulate DataFrame rows. We will provide detailed examples of code that can be run independently to demonstrate the versatility of the apply function.

Introduction to the Apply Function

The apply function in Pandas allows you to apply a function along an axis of the DataFrame (rows or columns). When applying a function to each row, you set the axis parameter to 1. The function you apply can be any callable that takes a DataFrame or Series and returns a DataFrame, Series, or a scalar. This makes apply highly flexible and powerful.

Example 1: Basic Usage of Apply on Rows

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Email': ['[email protected]', '[email protected]', '[email protected]']}
df = pd.DataFrame(data)

# Define a simple function to add a prefix to the email
def add_prefix(row):
    return "User_" + row['Email']

# Apply the function to each row
df['Email'] = df.apply(add_prefix, axis=1)
print(df)

Output:

Pandas apply function to each row

Example 2: Using Apply with a Lambda Function

import pandas as pd

# Create a DataFrame
data = {'Product': ['Widget', 'Gadget', 'Doodad'],
        'Price': [25.00, 35.50, 12.75],
        'Website': ['www.widgetcorp.com', 'www.gadgetpro.com', 'www.doodadsplace.com']}
df = pd.DataFrame(data)

# Use a lambda function to add a suffix to the product name
df['Product'] = df.apply(lambda row: row['Product'] + "_pandasdataframe.com", axis=1)
print(df)

Output:

Pandas apply function to each row

Example 3: Conditional Operations Using Apply

import pandas as pd

# Create a DataFrame
data = {'Employee': ['John Doe', 'Jane Smith', 'Anne Johnson'],
        'Salary': [90000, 110000, 75000],
        'Department': ['Sales', 'Marketing', 'HR']}
df = pd.DataFrame(data)

# Define a function to apply a raise based on department
def apply_raise(row):
    if row['Department'] == 'Sales':
        return row['Salary'] * 1.10
    else:
        return row['Salary']

# Apply the function to each row
df['Salary'] = df.apply(apply_raise, axis=1)
print(df)

Output:

Pandas apply function to each row

Example 4: Modifying Multiple Columns Using Apply

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Email': ['[email protected]', '[email protected]', '[email protected]']}
df = pd.DataFrame(data)

# Define a function to increment age and modify email
def modify_info(row):
    row['Age'] += 1
    row['Email'] = "Updated_" + row['Email']
    return row

# Apply the function to each row
df = df.apply(modify_info, axis=1)
print(df)

Output:

Pandas apply function to each row

Example 5: Using Apply to Generate a New Column

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35],
        'Country': ['USA', 'Canada', 'UK']}
df = pd.DataFrame(data)

# Define a function to determine eligibility based on age
def check_eligibility(row):
    return "Eligible" if row['Age'] > 28 else "Not Eligible"

# Apply the function to create a new column
df['Status'] = df.apply(check_eligibility, axis=1)
print(df)

Output:

Pandas apply function to each row

Example 6: Complex Operations Involving Multiple Columns

import pandas as pd

# Create a DataFrame
data = {'First Name': ['Alice', 'Bob', 'Charlie'],
        'Last Name': ['Smith', 'Jones', 'Brown'],
        'Domain': ['@pandasdataframe.com', '@pandasdataframe.com', '@pandasdataframe.com']}
df = pd.DataFrame(data)

# Define a function to create a full email address
def create_email(row):
    return row['First Name'].lower() + "." + row['Last Name'].lower() + row['Domain']

# Apply the function to each row
df['Email'] = df.apply(create_email, axis=1)
print(df)

Output:

Pandas apply function to each row

Example 7: Applying Functions that Return Multiple Values

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Age': [25, 30, 35]}
df = pd.DataFrame(data)

# Define a function to return multiple new values
def process_row(row):
    age_next_year = row['Age'] + 1
    age_in_ten_years = row['Age'] + 10
    return pd.Series([age_next_year, age_in_ten_years], index=['Age Next Year', 'Age in Ten Years'])

# Apply the function to each row
df[['Age Next Year', 'Age in Ten Years']] = df.apply(process_row, axis=1)
print(df)

Output:

Pandas apply function to each row

Example 8: Error Handling in Apply Functions

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Birth Year': [1985, 'unknown', 1990]}
df = pd.DataFrame(data)

# Define a function to calculate age
def calculate_age(row):
    try:
        return 2023 - int(row['Birth Year'])
    except ValueError:
        return 'Unknown'

# Apply the function to each row
df['Age'] = df.apply(calculate_age, axis=1)
print(df)

Output:

Pandas apply function to each row

Example 9: Using External Parameters in Apply Functions

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Sales': [300, 450, 500]}
df = pd.DataFrame(data)

# Define a function to apply a custom multiplier
def apply_custom_multiplier(row, multiplier):
    return row['Sales'] * multiplier

# Apply the function with an external multiplier
multiplier = 1.1
df['Adjusted Sales'] = df.apply(apply_custom_multiplier, axis=1, args=(multiplier,))
print(df)

Output:

Pandas apply function to each row

Example 10: Vectorized Operations vs Apply

import pandas as pd

# Create a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'],
        'Score': [88, 92, 85]}
df = pd.DataFrame(data)

# Define a function to add points
def add_points(row):
    return row['Score'] + 5

# Apply the function to each row
df['New Score'] = df.apply(add_points, axis=1)
print(df)

# Alternatively, use vectorized operations for better performance
df['New Score Vectorized'] = df['Score'] + 5
print(df)

Output:

Pandas apply function to each row

Pandas apply function to each row Conclusion

The apply function is a versatile tool in Pandas that allows you to perform complex row-wise transformations. Whether you’re modifying a single column, creating new columns, or performing error handling, apply provides a powerful way to process your data efficiently. By understanding and utilizing this function, you can significantly enhance your data manipulation capabilities in Python.