Pandas apply
Pandas is a powerful Python library for data manipulation and analysis. One of its core functionalities is the pandas apply function, which allows users to apply a function along an axis of the DataFrame or on values of Series. This guide will explore various use cases and provide detailed examples of how to use the pandas apply function effectively.
Pandas apply Recommended Articles
- Pandas apply args
- Pandas Apply Example
- Pandas Apply Function
- Pandas apply function to column
- Pandas apply function to each row
- Pandas apply function to every row
- Pandas apply function to multiple columns
Introduction to Pandas apply
The pandas apply function in Pandas can be used on a DataFrame or a Series. The basic syntax is:
DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds)
- func: The function to apply to each column or row.
- axis: {0 or ‘index’, 1 or ‘columns’}, default 0. Axis along which the function is applied:
- 0 or ‘index’: apply function to each column.
- 1 or ‘columns’: apply function to each row.
- raw: Determines if rows or columns are passed as Series or ndarrays.
- result_type: Choose the type of the resulting elements.
- args: Positional arguments to pass to function in addition to the array/series.
Basic Usage of pandas apply
Example 1: Applying a Function to Each Column
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': range(1, 5),
'B': np.random.randn(4)
})
def func(x):
return x * 2
result = df.apply(func)
print(result)
Output:
Example 2: Applying a Function to Each Row
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': range(1, 5),
'B': np.random.randn(4)
})
def func(x):
return x['A'] + x['B']
result = df.apply(func, axis=1)
print(result)
Output:
Applying Lambda Functions
Example 3: Using Lambda with pandas apply
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': range(1, 6),
'B': range(10, 0, -2)
})
result = df.apply(lambda x: x + 10)
print(result)
Output:
Conditional Logic with pandas apply
Example 4: Applying Conditional Logic
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': range(1, 5),
'B': [2, 5, 1, 9]
})
def check(x):
return 'High' if x > 5 else 'Low'
result = df['B'].apply(check)
print(result)
Output:
Using pandas apply with Multiple Arguments
Example 5: Passing Additional Arguments
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': range(1, 5),
'B': [10, 20, 30, 40]
})
def multiply(x, y, z):
return x * y * z
result = df['A'].apply(multiply, args=(5, 2))
print(result)
Output:
Applying Functions that Return Multiple Values
Example 6: Function Returning Multiple Values
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': np.random.randn(4),
'B': np.random.randn(4)
})
def stats(x):
return pd.Series([x.count(), x.mean(), x.std()], index=['count', 'mean', 'std'])
result = df.apply(stats)
print(result)
Output:
pandas apply in GroupBy Operations
Example 7: Using pandas apply with GroupBy
import pandas as pd
import numpy as np
df = pd.DataFrame({
'Key': ['A', 'B', 'A', 'B'],
'Data': range(1, 5)
})
def sum_square(group):
return np.sum(group**2)
result = df.groupby('Key')['Data'].apply(sum_square)
print(result)
Output:
Performance Considerations
Example 8: Comparing pandas apply with Vectorized Operations
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': np.random.rand(1000),
'B': np.random.rand(1000)
})
# Using apply()
result_apply = df.apply(np.sum)
# Using vectorized operations
result_vectorized = df.sum()
print(result_apply, result_vectorized)
Output:
Advanced Usage of pandas apply
Example 9: Applying a Complex Function
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': np.linspace(1, 10, 10),
'B': np.random.rand(10)
})
def complex_function(row):
if row['A'] > 5:
return row['A'] * 2
else:
return row['B'] / 2
result = df.apply(complex_function, axis=1)
print(result)
Output:
pandas apply Conclusion
The pandas apply function is a versatile tool in Pandas that allows for the application of functions across DataFrame rows or columns, enabling complex data manipulations and analyses. While it is a powerful feature, it is essential to be aware of its performance implications, especially when working with large datasets. Using pandas apply judiciously, in combination with vectorized operations and other Pandas functionalities, can lead to efficient and readable code.