Pandas agg multiple columns
Pandas is a powerful Python library for data manipulation and analysis. One of its core functionalities is the ability to perform aggregation operations on DataFrame objects. Aggregation refers to the process of combining multiple pieces of data into a single result. When working with multiple columns, Pandas provides various ways to aggregate data efficiently and flexibly. In this article, we will explore how to use the agg
function to perform aggregation on multiple columns of a DataFrame.
Introduction to DataFrame Aggregation
Aggregation can be used to compute summary statistics, such as sums, averages, or counts, across rows or columns. The agg
function in Pandas allows you to apply one or more operations over the specified axis. This function is particularly useful when you need to perform different aggregations on different columns.
Basic Syntax of agg
The basic syntax of the agg
function is as follows:
DataFrame.agg(func, axis=0, *args, **kwargs)
func
: Function, list of functions, or dictionary mapping columns to functions.axis
: {0 or ‘index’, 1 or ‘columns’}, default 0. If 0 or ‘index’, apply function to each column. If 1 or ‘columns’, apply function to each row.args
,kwargs
: Arguments to pass to the function.
Examples of Aggregating Multiple Columns
Let’s explore various examples of using the agg
function to perform different types of aggregations on multiple columns of a DataFrame.
Example 1: Basic Aggregation
import pandas as pd
# Create a sample DataFrame
data = {
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
}
df = pd.DataFrame(data)
# Aggregate multiple columns
result = df.agg({
'A': 'sum',
'B': 'min',
'C': 'max'
})
print(result)
Output:
Example 2: Applying Multiple Functions to Multiple Columns
import pandas as pd
# Create a sample DataFrame
data = {
'A': [10, 20, 30],
'B': [40, 50, 60],
'C': [70, 80, 90]
}
df = pd.DataFrame(data)
# Apply multiple aggregation functions to multiple columns
result = df.agg({
'A': ['sum', 'mean'],
'B': ['min', 'max'],
'C': ['mean', 'std']
})
print(result)
Output:
Example 3: Using Custom Functions
import pandas as pd
# Create a sample DataFrame
data = {
'A': [100, 200, 300],
'B': [400, 500, 600],
'C': [700, 800, 900]
}
df = pd.DataFrame(data)
# Define a custom aggregation function
def range_func(x):
return x.max() - x.min()
# Apply the custom function to multiple columns
result = df.agg({
'A': range_func,
'B': range_func,
'C': 'sum'
})
print(result)
Output:
Example 4: Aggregating with Lambda Functions
import pandas as pd
# Create a sample DataFrame
data = {
'A': [1000, 2000, 3000],
'B': [4000, 5000, 6000],
'C': [7000, 8000, 9000]
}
df = pd.DataFrame(data)
# Apply lambda functions for aggregation
result = df.agg({
'A': lambda x: x.mean(),
'B': lambda x: x.sum(),
'C': lambda x: x.std()
})
print(result)
Output:
Example 5: Aggregating Across All Columns
import pandas as pd
# Create a sample DataFrame
data = {
'A': [10, 20, 30],
'B': [40, 50, 60],
'C': [70, 80, 90]
}
df = pd.DataFrame(data)
# Aggregate all columns using a single function
result = df.agg('sum')
print(result)
Output:
Example 6: Aggregating with Different Functions for Each Column
import pandas as pd
# Create a sample DataFrame
data = {
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
}
df = pd.DataFrame(data)
# Apply different functions to each column
result = df.agg({
'A': 'sum',
'B': 'mean',
'C': 'max'
})
print(result)
Output:
Example 7: Using Named Aggregations for Clarity
import pandas as pd
# Create a sample DataFrame
data = {
'A': [100, 200, 300],
'B': [400, 500, 600],
'C': [700, 800, 900]
}
df = pd.DataFrame(data)
# Use named aggregations for better clarity
result = df.agg(A_sum=('A', 'sum'), B_mean=('B', 'mean'), C_max=('C', 'max'))
print(result)
Output:
Example 8: Aggregating with No Direct Output
import pandas as pd
# Create a sample DataFrame
data = {
'A': [1000, 2000, 3000],
'B': [4000, 5000, 6000],
'C': [7000, 8000, 9000]
}
df = pd.DataFrame(data)
# Perform aggregation without direct output
df.agg({
'A': 'sum',
'B': 'min',
'C': 'max'
})
print(df)
Output:
Example 9: Combining groupby
and agg
import pandas as pd
# Create a sample DataFrame
data = {
'Group': ['X', 'X', 'Y'],
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
}
df = pd.DataFrame(data)
# Group by 'Group' column and aggregate
result = df.groupby('Group').agg({
'A': 'sum',
'B': 'mean',
'C': 'max'
})
print(result)
Output:
Example 10: Aggregating with Filters
import pandas as pd
# Create a sample DataFrame
data = {
'A': [10, 20, 30],
'B': [40, 50, 60],
'C': [70, 80, 90]
}
df = pd.DataFrame(data)
# Apply aggregation with a filter
result = df[df['A'] > 10].agg({
'B': 'sum',
'C': 'mean'
})
print(result)
Output:
Pandas agg multiple columns conclusion
In this article, we explored various ways to use the agg
function in Pandas to perform aggregation on multiple columns. We covered basic aggregations, applying multiple functions, using custom and lambda functions, and combining groupby
with agg
. These techniques are essential for summarizing and analyzing large datasets efficiently.
By mastering these aggregation techniques, you can gain deeper insights into your data and perform complex data analysis tasks with ease. Whether you are working with financial, scientific, or any other type of data, understanding how to aggregate data effectively is a crucial skill in data science.