Pandas agg List

Pandas agg List

Pandas is a powerful Python library used for data manipulation and analysis. One of its core functionalities is the ability to perform aggregation operations on dataframes. The agg function is particularly versatile, allowing users to apply a variety of aggregation methods to a series or dataframe. This article will explore the agg function in-depth, providing a comprehensive guide on its usage with multiple examples.

Introduction to Pandas agg Function

The agg function in Pandas is used to apply one or more operations over the specified axis of a DataFrame or a Series. It is highly flexible, enabling the application of built-in aggregation functions, custom functions, or a combination of both. This function is particularly useful in summarizing data, performing statistical analysis, and data transformation.

Basic Usage of agg

Before diving into examples, let’s first understand the basic usage of the agg function. Here is a simple example:

import pandas as pd
import numpy as np

# Create a sample DataFrame
data = {
    'A': [1, 2, np.nan],
    'B': [4, np.nan, 6],
    'C': [7, 8, 9]
}
df = pd.DataFrame(data)

# Using agg to find the sum of each column
result = df.agg('sum')
print(result)

Output:

Pandas agg List

In this example, agg is used to calculate the sum of each column in the DataFrame.

Detailed Examples of Using agg

Now, let’s explore various ways to use the agg function with detailed examples. Each example will be standalone and can be run independently.

Example 1: Single Function Aggregation

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': range(1, 6),
    'B': range(10, 0, -2),
    'C': range(10, 15)
})

# Aggregate using a single function
result = df.agg('mean')
print(result)

Output:

Pandas agg List

Example 2: Multiple Function Aggregation on DataFrame

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': range(1, 6),
    'B': range(10, 0, -2),
    'C': range(10, 15)
})

# Aggregate using multiple functions
result = df.agg(['sum', 'min'])
print(result)

Output:

Pandas agg List

Example 3: Different Functions for Each Column

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': range(1, 6),
    'B': range(10, 0, -2),
    'C': range(10, 15)
})

# Aggregate using different functions for each column
result = df.agg({'A': 'sum', 'B': 'max', 'C': 'mean'})
print(result)

Output:

Pandas agg List

Example 4: Using Custom Functions

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': range(1, 6),
    'B': range(10, 0, -2),
    'C': range(10, 15)
})

# Define a custom function
def my_custom_func(x):
    return x.max() - x.min()

# Aggregate using a custom function
result = df.agg(my_custom_func)
print(result)

Output:

Pandas agg List

Example 5: Multiple Aggregations on Series

import pandas as pd

# Sample Series
s = pd.Series(range(10, 20))

# Multiple aggregations
result = s.agg(['sum', 'mean'])
print(result)

Output:

Pandas agg List

Example 6: Aggregating with Named Aggregations

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': range(1, 6),
    'B': range(10, 0, -2),
    'C': range(10, 15)
})

# Named aggregations
result = df.agg(total_A=('A', 'sum'), mean_C=('C', 'mean'))
print(result)

Output:

Pandas agg List

Example 7: Aggregating with Filters

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': range(1, 6),
    'B': range(10, 0, -2),
    'C': range(10, 15)
})

# Aggregate with a filter
result = df[df['A'] > 2].agg('sum')
print(result)

Output:

Pandas agg List

Example 8: Using agg in GroupBy Operations

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'key': ['A', 'B', 'A', 'B', 'A'],
    'data': range(5),
    'values': [100, 200, 300, 400, 500]
})

# Group by 'key' and aggregate
result = df.groupby('key').agg('sum')
print(result)

Output:

Pandas agg List

Example 9: Complex Aggregations

import pandas as pd

# Sample DataFrame
df = pd.DataFrame({
    'A': range(1, 6),
    'B': range(10, 0, -2),
    'C': range(10, 15)
})

# Complex aggregations
result = df.agg({
    'A': ['sum', 'min'],
    'B': ['max', 'mean'],
    'C': ['sum', lambda x: x.mean() + 1]
})
print(result)

Output:

Pandas agg List

Pandas agg List Conclusion

The agg function in Pandas is a powerful tool for data aggregation, offering flexibility to apply multiple and complex aggregations across different axes of a DataFrame or Series. By understanding and utilizing this function effectively, you can perform a wide range of data summarization and transformation tasks, which are essential for data analysis and decision-making processes.