Pandas DataFrame Filter by Column Value
Filtering data based on column values is a common operation in data analysis. Pandas, a powerful and flexible data manipulation library in Python, provides several methods to filter a DataFrame based on the values of one or more columns. This article will explore various techniques to filter rows in a DataFrame based on the values in specific columns using Pandas. We will cover methods like boolean indexing, the query
method, and using the loc
and iloc
accessors, among others.
1. Boolean Indexing
Boolean indexing is one of the most straightforward methods for filtering data in Pandas. It involves creating a boolean mask that is True for rows where the condition is met and False otherwise. This mask is then used to index the DataFrame.
Example 1: Filter rows where a column’s value is greater than a specified value
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'Website': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'example.com', 'pandasdataframe.com']}
df = pd.DataFrame(data)
# Filter rows where the age is greater than 25
filtered_df = df[df['Age'] > 25]
print(filtered_df)
Output:
Example 2: Filter rows based on multiple conditions
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'Website': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'example.com', 'pandasdataframe.com']}
df = pd.DataFrame(data)
# Filter rows where the age is greater than 25 and the website is 'pandasdataframe.com'
filtered_df = df[(df['Age'] > 25) & (df['Website'] == 'pandasdataframe.com')]
print(filtered_df)
Output:
2. The query
Method
The query
method allows you to filter rows using a query string. This can make the code more readable and concise, especially for complex conditions.
Example 3: Using query
to filter rows
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'Website': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'example.com', 'pandasdataframe.com']}
df = pd.DataFrame(data)
# Use query to filter rows
filtered_df = df.query('Age > 25')
print(filtered_df)
Output:
Example 4: Using query
with multiple conditions
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'Website': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'example.com', 'pandasdataframe.com']}
df = pd.DataFrame(data)
# Use query to filter rows with multiple conditions
filtered_df = df.query('Age > 25 and Website == "pandasdataframe.com"')
print(filtered_df)
Output:
3. Using loc
and iloc
Accessors
The loc
and iloc
accessors can be used for more advanced indexing and filtering. loc
is label-based, which means that you have to specify the name of the rows and columns that you want to filter. iloc
is integer index-based, so you specify the numeric indices of the rows and columns.
Example 5: Using loc
to filter rows
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'Website': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'example.com', 'pandasdataframe.com']}
df = pd.DataFrame(data)
# Use loc to filter rows
filtered_df = df.loc[df['Age'] > 25]
print(filtered_df)
Output:
Example 6: Using loc
with multiple conditions
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'Website': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'example.com', 'pandasdataframe.com']}
df = pd.DataFrame(data)
# Use loc to filter rows with multiple conditions
filtered_df = df.loc[(df['Age'] > 25) & (df['Website'] == 'pandasdataframe.com')]
print(filtered_df)
Output:
4. Filtering with isin
The isin
method is useful when you need to filter rows based on whether the column’s value is in a predefined list of values.
Example 7: Using isin
to filter rows
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'Website': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'example.com', 'pandasdataframe.com']}
df = pd.DataFrame(data)
# Define a list of names
names = ['Alice', 'David']
# Use isin to filter rows
filtered_df = df[df['Name'].isin(names)]
print(filtered_df)
Output:
5. Using filter
Method
The filter
method can be used to select columns based on their names. While it’s not directly used to filter rows based on column values, it can be combined with other methods to achieve this.
Example 8: Using filter
to select columns
import pandas as pd
# Create a sample DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [24, 27, 22, 32, 29],
'Website': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'example.com', 'pandasdataframe.com']}
df = pd.DataFrame(data)
# Use filter to select specific columns
filtered_columns = df.filter(items=['Name', 'Website'])
print(filtered_columns)
Output:
Pandas DataFrame Filter by Column Value Conclusion
In this article, we explored various methods to filter rows in a Pandas DataFrame based on column values. We covered techniques like boolean indexing, using the query
method, and the loc
and iloc
accessors, among others. Each method has its own use cases and can be chosen based on the specific requirements of your data manipulation task. By mastering these techniques, you can efficiently handle and analyze large datasets in Python using Pandas.