Pandas agg quantile
Pandas is a powerful Python library for data manipulation and analysis, providing flexible data structures that make it easy to manipulate numerical tables and time series. This guide focuses on two particularly useful methods: agg
and quantile
. These functions are essential for summarizing data, allowing for both simple and complex aggregations.
Introduction to Pandas agg
Function
The agg
function in Pandas is used to apply one or more operations over the specified axis of a DataFrame. It is highly versatile, allowing you to apply built-in summarizing functions or custom functions along an axis of DataFrame.
Example 1: Basic Usage of agg
with a Single Function
import pandas as pd
import numpy as np
# Create a sample DataFrame
df = pd.DataFrame({
'A': range(1, 6),
'B': np.random.randint(1, 100, 5)
})
# Using `agg` to calculate the sum of column A
result = df.agg({'A': 'sum'})
print(result)
Output:
Example 2: Using agg
with Multiple Functions
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame({
'A': np.random.randn(10),
'B': np.random.randint(1, 100, 10)
})
# Using `agg` to apply multiple functions to column A
result = df.agg({'A': ['sum', 'min', 'max']})
print(result)
Output:
Introduction to Pandas quantile
Function
The quantile
function in Pandas returns values at the given quantile over requested axis, a way to get statistical insights about the data distribution.
Example 3: Basic Usage of quantile
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame({
'A': np.random.randn(100),
'B': np.random.randint(1, 100, 100)
})
# Calculate the 50th percentile (median) for all columns
medians = df.quantile(0.5)
print(medians)
Output:
Example 4: Multiple Quantiles
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame({
'A': np.random.randn(100),
'B': np.random.randint(1, 100, 100)
})
# Calculate multiple quantiles for column A
quantiles = df['A'].quantile([0.25, 0.5, 0.75])
print(quantiles)
Output:
Combining agg
and quantile
You can combine agg
and quantile
to perform multiple aggregations, including quantiles, on a DataFrame.
Example 5: Using agg
with quantile
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame({
'A': np.random.randn(50),
'B': np.random.randint(1, 100, 50)
})
# Using `agg` to apply `quantile` along with other functions
result = df.agg({
'A': ['mean', 'std', lambda x: x.quantile(0.5)],
'B': ['min', 'max', lambda x: x.quantile(0.75)]
})
print(result)
Output:
Advanced Examples of agg
and quantile
Example 6: Custom Aggregation Function
import pandas as pd
import numpy as np
# Custom aggregation function
def custom_agg(x):
q1 = x.quantile(0.25)
q3 = x.quantile(0.75)
return q3 - q1
# Create a DataFrame
df = pd.DataFrame({
'A': np.random.randn(100),
'B': np.random.randint(1, 100, 100)
})
# Apply custom aggregation function
result = df.agg({'A': custom_agg, 'B': custom_agg})
print(result)
Output:
Example 7: Aggregating Over Rows
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame({
'A': np.random.randn(5),
'B': np.random.randint(1, 100, 5)
})
# Aggregate over rows
result = df.agg('mean', axis=1)
print(result)
Output:
Example 8: Using agg
with a Dictionary
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame({
'A': np.random.randn(10),
'B': np.random.randint(1, 100, 10),
'C': np.random.randn(10)
})
# Using a dictionary to specify multiple operations for different columns
result = df.agg({'A': ['sum', 'min'], 'B': ['max', 'mean'], 'C': ['std', lambda x: x.quantile(0.5)]})
print(result)
Output:
This guide provides a detailed look at how to use the agg
and quantile
functions in Pandas to perform a variety of data summarization tasks. By understanding these functions, you can efficiently summarize and analyze large datasets, extracting meaningful insights from your data.