Pandas DataFrame Loc with 2 Conditions

Pandas DataFrame Loc with 2 Conditions

Pandas is a powerful Python library used for data manipulation and analysis. One of its core structures is DataFrame, which can be thought of as a table with rows and columns. The loc attribute is an essential tool in Pandas that allows for label-based indexing, which means you can select data based on the data index value, not just the position. It is particularly useful when you need to filter data based on multiple conditions. In this article, we will explore how to use the loc method in Pandas DataFrames to filter data based on two conditions.

Understanding the Basics of DataFrame and loc

Before diving into complex examples, let’s understand the basic structure of a DataFrame and how the loc method works. A DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Here is how you can create a simple DataFrame:

import pandas as pd

data = {
    'Name': ['John Doe', 'Jane Smith', 'Alice Johnson', 'Mike Brown'],
    'Age': [28, 34, 24, 42],
    'City': ['New York', 'Los Angeles', 'New York', 'Chicago']
}

df = pd.DataFrame(data)
print(df)

Output:

Pandas DataFrame Loc with 2 Conditions

The loc method allows you to access a group of rows and columns by labels or a boolean array. loc will raise KeyError when the items are not found. Below is a simple example of using loc:

import pandas as pd

data = {
    'Name': ['John Doe', 'Jane Smith', 'Alice Johnson', 'Mike Brown'],
    'Age': [28, 34, 24, 42],
    'City': ['New York', 'Los Angeles', 'New York', 'Chicago']
}
df = pd.DataFrame(data)

result = df.loc[df['Age'] > 30]
print(result)

Output:

Pandas DataFrame Loc with 2 Conditions

Filtering with Two Conditions

When you need to filter data based on more than one condition, you can combine these conditions using logical operators such as & (and), | (or), and ~ (not). It is important to wrap each condition in parentheses because of Python’s operator precedence rules.

Example 1: Using & Operator

Let’s filter the DataFrame to find people who are from ‘New York’ and are older than 25.

import pandas as pd

data = {
    'Name': ['John Doe', 'Jane Smith', 'Alice Johnson', 'Mike Brown'],
    'Age': [28, 34, 24, 42],
    'City': ['New York', 'Los Angeles', 'New York', 'Chicago']
}
df = pd.DataFrame(data)

result = df.loc[(df['City'] == 'New York') & (df['Age'] > 25)]
print(result)

Output:

Pandas DataFrame Loc with 2 Conditions

Example 2: Using | Operator

Now, let’s find all people who are either from ‘New York’ or older than 30.

import pandas as pd

data = {
    'Name': ['John Doe', 'Jane Smith', 'Alice Johnson', 'Mike Brown'],
    'Age': [28, 34, 24, 42],
    'City': ['New York', 'Los Angeles', 'New York', 'Chicago']
}
df = pd.DataFrame(data)

result = df.loc[(df['City'] == 'New York') | (df['Age'] > 30)]
print(result)

Output:

Pandas DataFrame Loc with 2 Conditions

Example 3: Combining & and |

You can also combine these operators to perform more complex queries. For instance, finding all people who are from ‘New York’ and older than 25, or people from ‘Los Angeles’.

import pandas as pd

data = {
    'Name': ['John Doe', 'Jane Smith', 'Alice Johnson', 'Mike Brown'],
    'Age': [28, 34, 24, 42],
    'City': ['New York', 'Los Angeles', 'New York', 'Chicago']
}
df = pd.DataFrame(data)

result = df.loc[((df['City'] == 'New York') & (df['Age'] > 25)) | (df['City'] == 'Los Angeles')]
print(result)

Output:

Pandas DataFrame Loc with 2 Conditions

Practical Examples

Now that we understand the basics, let’s dive into more practical examples. We will use a more complex dataset and perform various data filtering operations using two conditions.

Example 4: Filtering Based on Text and Numeric Conditions

Suppose we have a dataset that includes information about various articles, including the domain and the number of views.

import pandas as pd

data = {
    'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
    'Domain': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'data.com'],
    'Views': [150, 90, 200, 300]
}
df = pd.DataFrame(data)

articles_df = pd.DataFrame(data)
print(articles_df)

Output:

Pandas DataFrame Loc with 2 Conditions

Let’s find articles from ‘pandasdataframe.com’ with more than 100 views.

import pandas as pd

data = {
    'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
    'Domain': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'data.com'],
    'Views': [150, 90, 200, 300]
}
df = pd.DataFrame(data)

articles_df = pd.DataFrame(data)

result = articles_df.loc[(articles_df['Domain'] == 'pandasdataframe.com') & (articles_df['Views'] > 100)]
print(result)

Output:

Pandas DataFrame Loc with 2 Conditions

Example 5: Using loc with Date Conditions

If you are working with time series data, you might need to filter based on date conditions. Let’s assume our dataset includes a date column.

import pandas as pd

data = {
    'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
    'PublishedDate': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'Views': [150, 90, 200, 300]
}

articles_df = pd.DataFrame(data)
articles_df['PublishedDate'] = pd.to_datetime(articles_df['PublishedDate'])

# Filter articles published after '2021-01-01' with more than 100 views
result = articles_df.loc[(articles_df['PublishedDate'] > '2021-01-01') & (articles_df['Views'] > 100)]
print(result)

Output:

Pandas DataFrame Loc with 2 Conditions

Example 6: Filtering with String Methods

Pandas also allows you to use string methods to perform filtering. Suppose you want to find articles where the title contains the word ‘Data’.

import pandas as pd

data = {
    'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
    'PublishedDate': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'Views': [150, 90, 200, 300]
}

articles_df = pd.DataFrame(data)
articles_df['PublishedDate'] = pd.to_datetime(articles_df['PublishedDate'])

result = articles_df.loc[articles_df['Title'].str.contains('Data') & (articles_df['Views'] > 100)]
print(result)

Output:

Pandas DataFrame Loc with 2 Conditions

Example 7: Case-Insensitive Filtering

If you need case-insensitive matching, you can use the case parameter in string methods.

import pandas as pd

data = {
    'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
    'PublishedDate': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'Views': [150, 90, 200, 300]
}

articles_df = pd.DataFrame(data)
articles_df['PublishedDate'] = pd.to_datetime(articles_df['PublishedDate'])

result = articles_df.loc[articles_df['Title'].str.contains('data', case=False) & (articles_df['Views'] > 100)]
print(result)

Output:

Pandas DataFrame Loc with 2 Conditions

Example 8: Filtering with isin

The isin method is useful when you need to filter data based on a list of values. For example, finding articles from either ‘pandasdataframe.com’ or ‘example.com’.

import pandas as pd

data = {
    'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
    'PublishedDate': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'Views': [150, 90, 200, 300]
}
df = pd.DataFrame(data)

articles_df = pd.DataFrame(data)
articles_df['PublishedDate'] = pd.to_datetime(articles_df['PublishedDate'])

domains = ['pandasdataframe.com', 'example.com']
result = articles_df.loc[articles_df['Domain'].isin(domains) & (articles_df['Views'] > 100)]
print(result)

Example 9: Combining Conditions Across Different Columns

You can also combine conditions that involve different columns. For instance, finding articles where the number of views is greater than the average views.

import pandas as pd

data = {
    'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
    'PublishedDate': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'Views': [150, 90, 200, 300]
}
df = pd.DataFrame(data)

articles_df = pd.DataFrame(data)
articles_df['PublishedDate'] = pd.to_datetime(articles_df['PublishedDate'])

average_views = articles_df['Views'].mean()
result = articles_df.loc[(articles_df['Domain'] == 'pandasdataframe.com') & (articles_df['Views'] > average_views)]
print(result)

Example 10: Using query Method

Pandas also provides a query method that can be used to perform filtering. This can sometimes be more readable, especially with complex conditions.

import pandas as pd

data = {
    'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
    'PublishedDate': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'Views': [150, 90, 200, 300]
}

articles_df = pd.DataFrame(data)
articles_df['PublishedDate'] = pd.to_datetime(articles_df['PublishedDate'])

result = articles_df.query("Domain == 'pandasdataframe.com' and Views > 100")
print(result)

Pandas DataFrame Loc with 2 Conditions Conclusion

Filtering data based on multiple conditions is a common task in data analysis. Using the loc method in Pandas, you can easily filter data in a DataFrame based on any number of conditions. This article provided several examples of how to use loc with two conditions, combining different logical operators and methods. By mastering these techniques, you can efficiently perform complex data filtering and manipulation tasks in your data analysis projects.

Like(0)