Pandas DataFrame Loc with 2 Conditions
Pandas is a powerful Python library used for data manipulation and analysis. One of its core structures is DataFrame, which can be thought of as a table with rows and columns. The loc
attribute is an essential tool in Pandas that allows for label-based indexing, which means you can select data based on the data index value, not just the position. It is particularly useful when you need to filter data based on multiple conditions. In this article, we will explore how to use the loc
method in Pandas DataFrames to filter data based on two conditions.
Understanding the Basics of DataFrame and loc
Before diving into complex examples, let’s understand the basic structure of a DataFrame and how the loc
method works. A DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Here is how you can create a simple DataFrame:
import pandas as pd
data = {
'Name': ['John Doe', 'Jane Smith', 'Alice Johnson', 'Mike Brown'],
'Age': [28, 34, 24, 42],
'City': ['New York', 'Los Angeles', 'New York', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
Output:
The loc
method allows you to access a group of rows and columns by labels or a boolean array. loc
will raise KeyError
when the items are not found. Below is a simple example of using loc
:
import pandas as pd
data = {
'Name': ['John Doe', 'Jane Smith', 'Alice Johnson', 'Mike Brown'],
'Age': [28, 34, 24, 42],
'City': ['New York', 'Los Angeles', 'New York', 'Chicago']
}
df = pd.DataFrame(data)
result = df.loc[df['Age'] > 30]
print(result)
Output:
Filtering with Two Conditions
When you need to filter data based on more than one condition, you can combine these conditions using logical operators such as &
(and), |
(or), and ~
(not). It is important to wrap each condition in parentheses because of Python’s operator precedence rules.
Example 1: Using &
Operator
Let’s filter the DataFrame to find people who are from ‘New York’ and are older than 25.
import pandas as pd
data = {
'Name': ['John Doe', 'Jane Smith', 'Alice Johnson', 'Mike Brown'],
'Age': [28, 34, 24, 42],
'City': ['New York', 'Los Angeles', 'New York', 'Chicago']
}
df = pd.DataFrame(data)
result = df.loc[(df['City'] == 'New York') & (df['Age'] > 25)]
print(result)
Output:
Example 2: Using |
Operator
Now, let’s find all people who are either from ‘New York’ or older than 30.
import pandas as pd
data = {
'Name': ['John Doe', 'Jane Smith', 'Alice Johnson', 'Mike Brown'],
'Age': [28, 34, 24, 42],
'City': ['New York', 'Los Angeles', 'New York', 'Chicago']
}
df = pd.DataFrame(data)
result = df.loc[(df['City'] == 'New York') | (df['Age'] > 30)]
print(result)
Output:
Example 3: Combining &
and |
You can also combine these operators to perform more complex queries. For instance, finding all people who are from ‘New York’ and older than 25, or people from ‘Los Angeles’.
import pandas as pd
data = {
'Name': ['John Doe', 'Jane Smith', 'Alice Johnson', 'Mike Brown'],
'Age': [28, 34, 24, 42],
'City': ['New York', 'Los Angeles', 'New York', 'Chicago']
}
df = pd.DataFrame(data)
result = df.loc[((df['City'] == 'New York') & (df['Age'] > 25)) | (df['City'] == 'Los Angeles')]
print(result)
Output:
Practical Examples
Now that we understand the basics, let’s dive into more practical examples. We will use a more complex dataset and perform various data filtering operations using two conditions.
Example 4: Filtering Based on Text and Numeric Conditions
Suppose we have a dataset that includes information about various articles, including the domain and the number of views.
import pandas as pd
data = {
'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
'Domain': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'data.com'],
'Views': [150, 90, 200, 300]
}
df = pd.DataFrame(data)
articles_df = pd.DataFrame(data)
print(articles_df)
Output:
Let’s find articles from ‘pandasdataframe.com’ with more than 100 views.
import pandas as pd
data = {
'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
'Domain': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'data.com'],
'Views': [150, 90, 200, 300]
}
df = pd.DataFrame(data)
articles_df = pd.DataFrame(data)
result = articles_df.loc[(articles_df['Domain'] == 'pandasdataframe.com') & (articles_df['Views'] > 100)]
print(result)
Output:
Example 5: Using loc
with Date Conditions
If you are working with time series data, you might need to filter based on date conditions. Let’s assume our dataset includes a date column.
import pandas as pd
data = {
'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
'PublishedDate': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'Views': [150, 90, 200, 300]
}
articles_df = pd.DataFrame(data)
articles_df['PublishedDate'] = pd.to_datetime(articles_df['PublishedDate'])
# Filter articles published after '2021-01-01' with more than 100 views
result = articles_df.loc[(articles_df['PublishedDate'] > '2021-01-01') & (articles_df['Views'] > 100)]
print(result)
Output:
Example 6: Filtering with String Methods
Pandas also allows you to use string methods to perform filtering. Suppose you want to find articles where the title contains the word ‘Data’.
import pandas as pd
data = {
'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
'PublishedDate': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'Views': [150, 90, 200, 300]
}
articles_df = pd.DataFrame(data)
articles_df['PublishedDate'] = pd.to_datetime(articles_df['PublishedDate'])
result = articles_df.loc[articles_df['Title'].str.contains('Data') & (articles_df['Views'] > 100)]
print(result)
Output:
Example 7: Case-Insensitive Filtering
If you need case-insensitive matching, you can use the case
parameter in string methods.
import pandas as pd
data = {
'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
'PublishedDate': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'Views': [150, 90, 200, 300]
}
articles_df = pd.DataFrame(data)
articles_df['PublishedDate'] = pd.to_datetime(articles_df['PublishedDate'])
result = articles_df.loc[articles_df['Title'].str.contains('data', case=False) & (articles_df['Views'] > 100)]
print(result)
Output:
Example 8: Filtering with isin
The isin
method is useful when you need to filter data based on a list of values. For example, finding articles from either ‘pandasdataframe.com’ or ‘example.com’.
import pandas as pd
data = {
'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
'PublishedDate': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'Views': [150, 90, 200, 300]
}
df = pd.DataFrame(data)
articles_df = pd.DataFrame(data)
articles_df['PublishedDate'] = pd.to_datetime(articles_df['PublishedDate'])
domains = ['pandasdataframe.com', 'example.com']
result = articles_df.loc[articles_df['Domain'].isin(domains) & (articles_df['Views'] > 100)]
print(result)
Example 9: Combining Conditions Across Different Columns
You can also combine conditions that involve different columns. For instance, finding articles where the number of views is greater than the average views.
import pandas as pd
data = {
'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
'PublishedDate': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'Views': [150, 90, 200, 300]
}
df = pd.DataFrame(data)
articles_df = pd.DataFrame(data)
articles_df['PublishedDate'] = pd.to_datetime(articles_df['PublishedDate'])
average_views = articles_df['Views'].mean()
result = articles_df.loc[(articles_df['Domain'] == 'pandasdataframe.com') & (articles_df['Views'] > average_views)]
print(result)
Example 10: Using query
Method
Pandas also provides a query
method that can be used to perform filtering. This can sometimes be more readable, especially with complex conditions.
import pandas as pd
data = {
'Title': ['Article 1', 'Article 2', 'Article 3', 'Article 4'],
'PublishedDate': ['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
'Views': [150, 90, 200, 300]
}
articles_df = pd.DataFrame(data)
articles_df['PublishedDate'] = pd.to_datetime(articles_df['PublishedDate'])
result = articles_df.query("Domain == 'pandasdataframe.com' and Views > 100")
print(result)
Pandas DataFrame Loc with 2 Conditions Conclusion
Filtering data based on multiple conditions is a common task in data analysis. Using the loc
method in Pandas, you can easily filter data in a DataFrame based on any number of conditions. This article provided several examples of how to use loc
with two conditions, combining different logical operators and methods. By mastering these techniques, you can efficiently perform complex data filtering and manipulation tasks in your data analysis projects.