Pandas apply

Pandas apply

Pandas is a powerful Python library for data manipulation and analysis. One of its core functionalities is the pandas apply function, which allows users to apply a function along an axis of the DataFrame or on values of Series. This guide will explore various use cases and provide detailed examples of how to use the pandas apply function effectively.

Pandas apply Recommended Articles

Introduction to Pandas apply

The pandas apply function in Pandas can be used on a DataFrame or a Series. The basic syntax is:

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)
  • func: The function to apply to each column or row.
  • axis: {0 or ‘index’, 1 or ‘columns’}, default 0. Axis along which the function is applied:
    • 0 or ‘index’: apply function to each column.
    • 1 or ‘columns’: apply function to each row.
  • raw: Determines if rows or columns are passed as Series or ndarrays.
  • result_type: Choose the type of the resulting elements.
  • args: Positional arguments to pass to function in addition to the array/series.

Basic Usage of pandas apply

Example 1: Applying a Function to Each Column

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': range(1, 5),
    'B': np.random.randn(4)
})

def func(x):
    return x * 2

result = df.apply(func)
print(result)

Output:

Pandas apply

Example 2: Applying a Function to Each Row

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': range(1, 5),
    'B': np.random.randn(4)
})

def func(x):
    return x['A'] + x['B']

result = df.apply(func, axis=1)
print(result)

Output:

Pandas apply

Applying Lambda Functions

Example 3: Using Lambda with pandas apply

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': range(1, 6),
    'B': range(10, 0, -2)
})

result = df.apply(lambda x: x + 10)
print(result)

Output:

Pandas apply

Conditional Logic with pandas apply

Example 4: Applying Conditional Logic

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': range(1, 5),
    'B': [2, 5, 1, 9]
})

def check(x):
    return 'High' if x > 5 else 'Low'

result = df['B'].apply(check)
print(result)

Output:

Pandas apply

Using pandas apply with Multiple Arguments

Example 5: Passing Additional Arguments

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': range(1, 5),
    'B': [10, 20, 30, 40]
})

def multiply(x, y, z):
    return x * y * z

result = df['A'].apply(multiply, args=(5, 2))
print(result)

Output:

Pandas apply

Applying Functions that Return Multiple Values

Example 6: Function Returning Multiple Values

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': np.random.randn(4),
    'B': np.random.randn(4)
})

def stats(x):
    return pd.Series([x.count(), x.mean(), x.std()], index=['count', 'mean', 'std'])

result = df.apply(stats)
print(result)

Output:

Pandas apply

pandas apply in GroupBy Operations

Example 7: Using pandas apply with GroupBy

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'Key': ['A', 'B', 'A', 'B'],
    'Data': range(1, 5)
})

def sum_square(group):
    return np.sum(group**2)

result = df.groupby('Key')['Data'].apply(sum_square)
print(result)

Output:

Pandas apply

Performance Considerations

Example 8: Comparing pandas apply with Vectorized Operations

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': np.random.rand(1000),
    'B': np.random.rand(1000)
})

# Using apply()
result_apply = df.apply(np.sum)

# Using vectorized operations
result_vectorized = df.sum()

print(result_apply, result_vectorized)

Output:

Pandas apply

Advanced Usage of pandas apply

Example 9: Applying a Complex Function

import pandas as pd
import numpy as np

df = pd.DataFrame({
    'A': np.linspace(1, 10, 10),
    'B': np.random.rand(10)
})

def complex_function(row):
    if row['A'] > 5:
        return row['A'] * 2
    else:
        return row['B'] / 2

result = df.apply(complex_function, axis=1)
print(result)

Output:

Pandas apply

pandas apply Conclusion

The pandas apply function is a versatile tool in Pandas that allows for the application of functions across DataFrame rows or columns, enabling complex data manipulations and analyses. While it is a powerful feature, it is essential to be aware of its performance implications, especially when working with large datasets. Using pandas apply judiciously, in combination with vectorized operations and other Pandas functionalities, can lead to efficient and readable code.