Pandas Apply Function
The apply
function in pandas is a powerful tool that allows you to apply a function along an axis of the DataFrame or to elements of Series. This function is extremely useful for data manipulation and analysis in Python using pandas. This article will explore the apply
function in-depth, providing a comprehensive guide on its usage with various examples.
Introduction to Pandas Apply
The apply
function can be used on both Series and DataFrame objects. When used on a DataFrame, you can apply a function either row-wise or column-wise. On a Series, it applies a function element-wise. The flexibility of the apply
function makes it one of the most useful functions in pandas for data transformation and preprocessing.
Syntax of Apply
The basic syntax of the apply
function is:
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)
- func: The function to apply to each column or row.
- axis: Axis along which the function is applied.
0
for applying the function to each column,1
for applying it to each row. - raw: Determines if rows passed to the function are Series or ndarray objects.
- result_type: Choose the type of the resulting array. By default, the return type will be inferred.
- args: Positional arguments to pass to the function.
- kwds: Additional keyword arguments to pass to the function.
Examples of Using Apply
Example 1: Applying a Simple Function to a DataFrame Column
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': range(1, 6),
'B': ['pandasdataframe.com' for _ in range(5)]
})
# Define a simple function
def add_five(x):
return x + 5
# Apply function to column 'A'
df['A'] = df['A'].apply(add_five)
print(df)
Output:
Example 2: Applying a Function Using Lambda
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': range(1, 6),
'B': ['pandasdataframe.com' for _ in range(5)]
})
# Apply a lambda function to column 'A'
df['A'] = df['A'].apply(lambda x: x * 2)
print(df)
Output:
Example 3: Applying a Function Across Rows
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': range(1, 6),
'B': range(10, 15),
'C': ['pandasdataframe.com' for _ in range(5)]
})
# Define a function to sum values of a row
def sum_row(row):
return row['A'] + row['B']
# Apply function across rows
df['Sum'] = df.apply(sum_row, axis=1)
print(df)
Output:
Example 4: Using Apply with Additional Arguments
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': range(1, 6),
'B': ['pandasdataframe.com' for _ in range(5)]
})
# Define a function that uses additional arguments
def multiply_by_factor(x, factor):
return x * factor
# Apply function with additional argument
df['A'] = df['A'].apply(multiply_by_factor, args=(10,))
print(df)
Output:
Example 5: Applying a Function that Returns Multiple Values
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': range(1, 6),
'B': ['pandasdataframe.com' for _ in range(5)]
})
# Define a function that returns multiple values
def func(x):
return x, x * 2
# Apply function and expand result into separate columns
df[['A1', 'A2']] = df['A'].apply(func).apply(pd.Series)
print(df)
Output:
Example 6: Error Handling in Apply Functions
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1, 2, 0, 4, 5],
'B': ['pandasdataframe.com' for _ in range(5)]
})
# Define a function with error handling
def safe_divide(x):
try:
return 10 / x
except ZeroDivisionError:
return 'undefined'
# Apply function to column 'A'
df['Result'] = df['A'].apply(safe_divide)
print(df)
Output:
Example 7: Applying a Function that Uses External Data
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': range(1, 6),
'B': ['pandasdataframe.com' for _ in range(5)]
})
# External data
external_data = {1: 100, 2: 200, 3: 300, 4: 400, 5: 500}
# Define a function that uses external data
def add_external_data(x):
return x + external_data[x]
# Apply function to column 'A'
df['A'] = df['A'].apply(add_external_data)
print(df)
Output:
Example 8: Using Apply to Implement Conditional Logic
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [10, 20, 30, 40, 50],
'B': ['pandasdataframe.com' for _ in range(5)]
})
# Define a function with conditional logic
def check_value(x):
if x > 25:
return 'High'
else:
return 'Low'
# Apply function to column 'A'
df['Category'] = df['A'].apply(check_value)
print(df)
Output:
Pandas Apply Function Conclusion
The apply
function in pandas is a versatile tool that can be used for a wide range of data manipulation tasks. It allows for the application of both simple and complex functions to DataFrame and Series objects, enabling efficient and powerful data analysis and transformation. By understanding how to use the apply
function effectively, you can significantly enhance your data processing workflows in Python.