Pandas DataFrame Filter

Pandas DataFrame Filter

Pandas is a powerful data manipulation library in Python that provides flexible data structures that make data manipulation and analysis easy. One of the most commonly used data structures in Pandas is the DataFrame. A DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

Filtering is one of the most frequent operations performed on a DataFrame. It allows you to select specific rows or columns from a DataFrame based on some condition. In this article, we will explore different ways to filter data in a Pandas DataFrame.

1. Using Boolean Indexing

Boolean indexing is a type of indexing that allows you to select rows or columns from a DataFrame based on a Boolean condition. Here is an example:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'Country': ['USA', 'USA', 'UK', 'Canada']}
df = pd.DataFrame(data)

# Filter rows where Age is greater than 30
filtered_df = df[df['Age'] > 30]
print(filtered_df)

Output:

Pandas DataFrame Filter

In the above example, df['Age'] > 30 returns a Boolean Series where each element is True if the corresponding age is greater than 30, and False otherwise. The DataFrame df is then indexed with this Boolean Series, returning only the rows where the condition is True.

2. Using the query Method

The query method allows you to filter data using a query string. Here is an example:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'Country': ['USA', 'USA', 'UK', 'Canada']}
df = pd.DataFrame(data)

# Filter rows where Age is greater than 30
filtered_df = df.query('Age > 30')
print(filtered_df)

Output:

Pandas DataFrame Filter

In the above example, the query string ‘Age > 30’ is used to filter the DataFrame. The query method returns a new DataFrame containing only the rows where the condition is True.

3. Using the loc and iloc Methods

The loc and iloc methods allow you to filter data based on labels and integer-based location respectively. Here are examples:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'Country': ['USA', 'USA', 'UK', 'Canada']}
df = pd.DataFrame(data)

# Filter rows where Age is greater than 30 using loc
filtered_df = df.loc[df['Age'] > 30]

# Filter the first three rows using iloc
filtered_df = df.iloc[0:3]
print(filtered_df)

Output:

Pandas DataFrame Filter

In the first example, df['Age'] > 30 returns a Boolean Series, which is used to filter the DataFrame using the loc method. In the second example, df.iloc[0:3] returns the first three rows of the DataFrame.

4. Using the isin Method

The isin method allows you to filter data based on whether each element in the DataFrame is contained in a list of values. Here is an example:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'Country': ['USA', 'USA', 'UK', 'Canada']}
df = pd.DataFrame(data)

# Filter rows where Country is either USA or UK
filtered_df = df[df['Country'].isin(['USA', 'UK'])]
print(filtered_df)

Output:

Pandas DataFrame Filter

In the above example, df['Country'].isin(['USA', 'UK']) returns a Boolean Series where each element is True if the corresponding country is either ‘USA’ or ‘UK’, and False otherwise. The DataFrame df is then indexed with this Boolean Series, returning only the rows where the condition is True.

5. Using the filter Method

The filter method allows you to filter data based on labels. Here is an example:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'Country': ['USA', 'USA', 'UK', 'Canada']}
df = pd.DataFrame(data)

# Filter columns that contain the string 'Name'
filtered_df = df.filter(like='Name')
print(filtered_df)

Output:

Pandas DataFrame Filter

In the above example, df.filter(like='Name') returns a new DataFrame containing only the columns whose labels contain the string ‘Name’.

In conclusion, Pandas provides a variety of methods to filter data in a DataFrame. The method you choose depends on your specific needs and the nature of your data.