Pandas agg multiple columns

Pandas agg multiple columns

Pandas is a powerful Python library for data manipulation and analysis. One of its core functionalities is the ability to perform aggregation operations on DataFrame objects. Aggregation refers to the process of combining multiple pieces of data into a single result. When working with multiple columns, Pandas provides various ways to aggregate data efficiently and flexibly. In this article, we will explore how to use the agg function to perform aggregation on multiple columns of a DataFrame.

Introduction to DataFrame Aggregation

Aggregation can be used to compute summary statistics, such as sums, averages, or counts, across rows or columns. The agg function in Pandas allows you to apply one or more operations over the specified axis. This function is particularly useful when you need to perform different aggregations on different columns.

Basic Syntax of agg

The basic syntax of the agg function is as follows:

DataFrame.agg(func, axis=0, *args, **kwargs)
  • func: Function, list of functions, or dictionary mapping columns to functions.
  • axis: {0 or ‘index’, 1 or ‘columns’}, default 0. If 0 or ‘index’, apply function to each column. If 1 or ‘columns’, apply function to each row.
  • args, kwargs: Arguments to pass to the function.

Examples of Aggregating Multiple Columns

Let’s explore various examples of using the agg function to perform different types of aggregations on multiple columns of a DataFrame.

Example 1: Basic Aggregation

import pandas as pd

# Create a sample DataFrame
data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}
df = pd.DataFrame(data)

# Aggregate multiple columns
result = df.agg({
    'A': 'sum',
    'B': 'min',
    'C': 'max'
})
print(result)

Output:

Pandas agg multiple columns

Example 2: Applying Multiple Functions to Multiple Columns

import pandas as pd

# Create a sample DataFrame
data = {
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
}
df = pd.DataFrame(data)

# Apply multiple aggregation functions to multiple columns
result = df.agg({
    'A': ['sum', 'mean'],
    'B': ['min', 'max'],
    'C': ['mean', 'std']
})
print(result)

Output:

Pandas agg multiple columns

Example 3: Using Custom Functions

import pandas as pd

# Create a sample DataFrame
data = {
    'A': [100, 200, 300],
    'B': [400, 500, 600],
    'C': [700, 800, 900]
}
df = pd.DataFrame(data)

# Define a custom aggregation function
def range_func(x):
    return x.max() - x.min()

# Apply the custom function to multiple columns
result = df.agg({
    'A': range_func,
    'B': range_func,
    'C': 'sum'
})
print(result)

Output:

Pandas agg multiple columns

Example 4: Aggregating with Lambda Functions

import pandas as pd

# Create a sample DataFrame
data = {
    'A': [1000, 2000, 3000],
    'B': [4000, 5000, 6000],
    'C': [7000, 8000, 9000]
}
df = pd.DataFrame(data)

# Apply lambda functions for aggregation
result = df.agg({
    'A': lambda x: x.mean(),
    'B': lambda x: x.sum(),
    'C': lambda x: x.std()
})
print(result)

Output:

Pandas agg multiple columns

Example 5: Aggregating Across All Columns

import pandas as pd

# Create a sample DataFrame
data = {
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
}
df = pd.DataFrame(data)

# Aggregate all columns using a single function
result = df.agg('sum')
print(result)

Output:

Pandas agg multiple columns

Example 6: Aggregating with Different Functions for Each Column

import pandas as pd

# Create a sample DataFrame
data = {
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}
df = pd.DataFrame(data)

# Apply different functions to each column
result = df.agg({
    'A': 'sum',
    'B': 'mean',
    'C': 'max'
})
print(result)

Output:

Pandas agg multiple columns

Example 7: Using Named Aggregations for Clarity

import pandas as pd

# Create a sample DataFrame
data = {
    'A': [100, 200, 300],
    'B': [400, 500, 600],
    'C': [700, 800, 900]
}
df = pd.DataFrame(data)

# Use named aggregations for better clarity
result = df.agg(A_sum=('A', 'sum'), B_mean=('B', 'mean'), C_max=('C', 'max'))
print(result)

Output:

Pandas agg multiple columns

Example 8: Aggregating with No Direct Output

import pandas as pd

# Create a sample DataFrame
data = {
    'A': [1000, 2000, 3000],
    'B': [4000, 5000, 6000],
    'C': [7000, 8000, 9000]
}
df = pd.DataFrame(data)

# Perform aggregation without direct output
df.agg({
    'A': 'sum',
    'B': 'min',
    'C': 'max'
})
print(df)

Output:

Pandas agg multiple columns

Example 9: Combining groupby and agg

import pandas as pd

# Create a sample DataFrame
data = {
    'Group': ['X', 'X', 'Y'],
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
}
df = pd.DataFrame(data)

# Group by 'Group' column and aggregate
result = df.groupby('Group').agg({
    'A': 'sum',
    'B': 'mean',
    'C': 'max'
})
print(result)

Output:

Pandas agg multiple columns

Example 10: Aggregating with Filters

import pandas as pd

# Create a sample DataFrame
data = {
    'A': [10, 20, 30],
    'B': [40, 50, 60],
    'C': [70, 80, 90]
}
df = pd.DataFrame(data)

# Apply aggregation with a filter
result = df[df['A'] > 10].agg({
    'B': 'sum',
    'C': 'mean'
})
print(result)

Output:

Pandas agg multiple columns

Pandas agg multiple columns conclusion

In this article, we explored various ways to use the agg function in Pandas to perform aggregation on multiple columns. We covered basic aggregations, applying multiple functions, using custom and lambda functions, and combining groupby with agg. These techniques are essential for summarizing and analyzing large datasets efficiently.

By mastering these aggregation techniques, you can gain deeper insights into your data and perform complex data analysis tasks with ease. Whether you are working with financial, scientific, or any other type of data, understanding how to aggregate data effectively is a crucial skill in data science.