Pandas DataFrame loc with Multiple Conditions
Pandas is a powerful data manipulation library in Python that provides data structures and functions for effectively handling and analyzing large datasets. One of the most versatile features of Pandas is the DataFrame
, which is essentially a table-like data structure. In this article, we will explore how to use the loc
method in Pandas DataFrames to filter data based on multiple conditions.
Introduction to loc
in Pandas
The loc
method is used for accessing a group of rows and columns by labels or a boolean array. loc
is primarily label based, but may also be used with a boolean array. The key point is that loc
allows for label-based indexing, which means that you can specify the names of the rows and columns that you want to filter out of your dataset.
Basic Usage of loc
Before diving into multiple conditions, let’s look at a simple example of how loc
is used. Suppose we have the following DataFrame:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 40, 45],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)
print(df)
Output:
If we want to access the row for Bob, we can use loc
like this:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 40, 45],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)
print(df.loc[df['Name'] == 'Bob'])
Output:
Using loc
with Multiple Conditions
When you need to filter a DataFrame based on multiple conditions, you can combine these conditions using logical operators such as &
(and), |
(or), and ~
(not).
Example 1: Using &
Operator
Let’s say we want to find all people in the DataFrame who are over 30 years old and live in Los Angeles.
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 40, 45],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)
print(df.loc[(df['Age'] > 30) & (df['City'] == 'Los Angeles')])
Output:
Example 2: Using |
Operator
Alternatively, if we want to find all people who are either over 40 years old or live in New York:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 40, 45],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)
print(df.loc[(df['Age'] > 40) | (df['City'] == 'New York')])
Output:
Example 3: Using ~
Operator
If we want to find all people who do not live in Chicago:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Age': [25, 30, 35, 40, 45],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}
df = pd.DataFrame(data)
print(df.loc[~(df['City'] == 'Chicago')])
Output:
Detailed Examples with Multiple Conditions
Now, let’s dive deeper and explore more complex examples using the loc
method with multiple conditions. Each example will include a complete, standalone piece of code.
Example 4
import pandas as pd
data = {
'Product': ['Laptop', 'Printer', 'Tablet', 'Desk', 'Chair'],
'Price': [1200, 150, 300, 450, 200],
'InStock': [True, False, True, True, False]
}
df = pd.DataFrame(data)
# Find all products that are in stock and cost more than 300
print(df.loc[(df['InStock'] == True) & (df['Price'] > 300)])
Output:
Example 5
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com', 'pandasdataframe.com', 'datasite.com'],
'Visits': [1000, 300, 400, 800, 500],
'Revenue': [100, 20, 30, 90, 50]
}
df = pd.DataFrame(data)
# Find all records from pandasdataframe.com with more than 500 visits
print(df.loc[(df['Website'] == 'pandasdataframe.com') & (df['Visits'] > 500)])
Output:
Example 6
import pandas as pd
data = {
'Email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]'],
'Subscription': [True, False, True, False, True],
'Months': [12, 6, 3, 8, 10]
}
df = pd.DataFrame(data)
# Find all users with a subscription from pandasdataframe.com with more than 6 months
print(df.loc[(df['Email'].str.contains('pandasdataframe.com')) & (df['Subscription'] == True) & (df['Months'] > 6)])
Output:
Example 7
import pandas as pd
data = {
'ID': [1, 2, 3, 4, 5],
'Score': [88, 92, 85, 91, 87],
'School': ['SchoolA', 'SchoolB', 'SchoolA', 'SchoolC', 'SchoolB']
}
df = pd.DataFrame(data)
# Find all students from SchoolA or SchoolB with a score greater than 85
print(df.loc[((df['School'] == 'SchoolA') | (df['School'] == 'SchoolB')) & (df['Score'] > 85)])
Output:
Example 8
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
'Salary': [70000, 80000, 90000, 100000, 110000],
'Department': ['HR', 'Marketing', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
# Find all employees in the HR department with a salary greater than 80000
print(df.loc[(df['Department'] == 'HR') & (df['Salary'] > 80000)])
Output:
Example 9
import pandas as pd
data = {
'Product': ['pandasdataframe.com Book', 'pandasdataframe.com Course', 'Other Book', 'Other Course', 'pandasdataframe.com Guide'],
'Price': [29, 99, 19, 39, 49],
'Rating': [4.5, 4.8, 3.7, 4.0, 4.9]
}
df = pd.DataFrame(data)
# Find all products related to pandasdataframe.com with a rating of at least 4.5
print(df.loc[(df['Product'].str.contains('pandasdataframe.com')) & (df['Rating'] >= 4.5)])
Output:
Example 10
import pandas as pd
data = {
'Username': ['user1', 'user2', 'user3', 'user4', 'user5'],
'Login': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'testsite.com', 'pandasdataframe.com'],
'Active': [True, False, True, False, True]
}
df = pd.DataFrame(data)
# Find all active users who logged in through pandasdataframe.com
print(df.loc[(df['Login'] == 'pandasdataframe.com') & (df['Active'] == True)])
Output:
Pandas DataFrame loc with Multiple Conditions Conclusion
Using the loc
method with multiple conditions in Pandas is a powerful way to filter and analyze data. By combining conditions with logical operators, you can perform complex queries on your dataframes. This functionality is particularly useful in data analysis, allowing you to easily extract insights from large datasets. Remember to always ensure that your conditions are enclosed in parentheses to avoid any ambiguity in the order of operations.