Pandas DataFrame loc with Multiple Conditions

Pandas DataFrame loc with Multiple Conditions

Pandas is a powerful data manipulation library in Python that provides data structures and functions for effectively handling and analyzing large datasets. One of the most versatile features of Pandas is the DataFrame, which is essentially a table-like data structure. In this article, we will explore how to use the loc method in Pandas DataFrames to filter data based on multiple conditions.

Introduction to loc in Pandas

The loc method is used for accessing a group of rows and columns by labels or a boolean array. loc is primarily label based, but may also be used with a boolean array. The key point is that loc allows for label-based indexing, which means that you can specify the names of the rows and columns that you want to filter out of your dataset.

Basic Usage of loc

Before diving into multiple conditions, let’s look at a simple example of how loc is used. Suppose we have the following DataFrame:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}

df = pd.DataFrame(data)

print(df)

Output:

Pandas DataFrame loc with Multiple Conditions

If we want to access the row for Bob, we can use loc like this:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}

df = pd.DataFrame(data)

print(df.loc[df['Name'] == 'Bob'])

Output:

Pandas DataFrame loc with Multiple Conditions

Using loc with Multiple Conditions

When you need to filter a DataFrame based on multiple conditions, you can combine these conditions using logical operators such as & (and), | (or), and ~ (not).

Example 1: Using & Operator

Let’s say we want to find all people in the DataFrame who are over 30 years old and live in Los Angeles.

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}

df = pd.DataFrame(data)

print(df.loc[(df['Age'] > 30) & (df['City'] == 'Los Angeles')])

Output:

Pandas DataFrame loc with Multiple Conditions

Example 2: Using | Operator

Alternatively, if we want to find all people who are either over 40 years old or live in New York:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}

df = pd.DataFrame(data)

print(df.loc[(df['Age'] > 40) | (df['City'] == 'New York')])

Output:

Pandas DataFrame loc with Multiple Conditions

Example 3: Using ~ Operator

If we want to find all people who do not live in Chicago:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix']
}

df = pd.DataFrame(data)

print(df.loc[~(df['City'] == 'Chicago')])

Output:

Pandas DataFrame loc with Multiple Conditions

Detailed Examples with Multiple Conditions

Now, let’s dive deeper and explore more complex examples using the loc method with multiple conditions. Each example will include a complete, standalone piece of code.

Example 4

import pandas as pd

data = {
    'Product': ['Laptop', 'Printer', 'Tablet', 'Desk', 'Chair'],
    'Price': [1200, 150, 300, 450, 200],
    'InStock': [True, False, True, True, False]
}

df = pd.DataFrame(data)

# Find all products that are in stock and cost more than 300
print(df.loc[(df['InStock'] == True) & (df['Price'] > 300)])

Output:

Pandas DataFrame loc with Multiple Conditions

Example 5

import pandas as pd

data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com', 'pandasdataframe.com', 'datasite.com'],
    'Visits': [1000, 300, 400, 800, 500],
    'Revenue': [100, 20, 30, 90, 50]
}

df = pd.DataFrame(data)

# Find all records from pandasdataframe.com with more than 500 visits
print(df.loc[(df['Website'] == 'pandasdataframe.com') & (df['Visits'] > 500)])

Output:

Pandas DataFrame loc with Multiple Conditions

Example 6

import pandas as pd

data = {
    'Email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]'],
    'Subscription': [True, False, True, False, True],
    'Months': [12, 6, 3, 8, 10]
}

df = pd.DataFrame(data)

# Find all users with a subscription from pandasdataframe.com with more than 6 months
print(df.loc[(df['Email'].str.contains('pandasdataframe.com')) & (df['Subscription'] == True) & (df['Months'] > 6)])

Output:

Pandas DataFrame loc with Multiple Conditions

Example 7

import pandas as pd

data = {
    'ID': [1, 2, 3, 4, 5],
    'Score': [88, 92, 85, 91, 87],
    'School': ['SchoolA', 'SchoolB', 'SchoolA', 'SchoolC', 'SchoolB']
}

df = pd.DataFrame(data)

# Find all students from SchoolA or SchoolB with a score greater than 85
print(df.loc[((df['School'] == 'SchoolA') | (df['School'] == 'SchoolB')) & (df['Score'] > 85)])

Output:

Pandas DataFrame loc with Multiple Conditions

Example 8

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Salary': [70000, 80000, 90000, 100000, 110000],
    'Department': ['HR', 'Marketing', 'IT', 'Finance', 'HR']
}

df = pd.DataFrame(data)

# Find all employees in the HR department with a salary greater than 80000
print(df.loc[(df['Department'] == 'HR') & (df['Salary'] > 80000)])

Output:

Pandas DataFrame loc with Multiple Conditions

Example 9

import pandas as pd

data = {
    'Product': ['pandasdataframe.com Book', 'pandasdataframe.com Course', 'Other Book', 'Other Course', 'pandasdataframe.com Guide'],
    'Price': [29, 99, 19, 39, 49],
    'Rating': [4.5, 4.8, 3.7, 4.0, 4.9]
}

df = pd.DataFrame(data)

# Find all products related to pandasdataframe.com with a rating of at least 4.5
print(df.loc[(df['Product'].str.contains('pandasdataframe.com')) & (df['Rating'] >= 4.5)])

Output:

Pandas DataFrame loc with Multiple Conditions

Example 10

import pandas as pd

data = {
    'Username': ['user1', 'user2', 'user3', 'user4', 'user5'],
    'Login': ['pandasdataframe.com', 'example.com', 'pandasdataframe.com', 'testsite.com', 'pandasdataframe.com'],
    'Active': [True, False, True, False, True]
}

df = pd.DataFrame(data)

# Find all active users who logged in through pandasdataframe.com
print(df.loc[(df['Login'] == 'pandasdataframe.com') & (df['Active'] == True)])

Output:

Pandas DataFrame loc with Multiple Conditions

Pandas DataFrame loc with Multiple Conditions Conclusion

Using the loc method with multiple conditions in Pandas is a powerful way to filter and analyze data. By combining conditions with logical operators, you can perform complex queries on your dataframes. This functionality is particularly useful in data analysis, allowing you to easily extract insights from large datasets. Remember to always ensure that your conditions are enclosed in parentheses to avoid any ambiguity in the order of operations.