Pandas DataFrame loc Multiple Conditions

Pandas DataFrame loc Multiple Conditions

Pandas is a powerful data manipulation library in Python. It provides data structures and functions needed to manipulate structured data. One of the most commonly used data structures in pandas is DataFrame. DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL table, or a dictionary of Series objects.

In this article, we will focus on the loc function in pandas DataFrame, specifically on how to use it with multiple conditions. The loc function is a label-based data selection method which means that we have to pass the name of the row or column which we want to select. This method includes the last element of the range as well, unlike iloc which is an integer index-based method.

Basic Usage of loc

Before we dive into using loc with multiple conditions, let’s first understand its basic usage. Here is an example:

import pandas as pd

data = {
    'name': ['John', 'Anna', 'Peter', 'Linda'],
    'age': [28, 24, 35, 32],
    'city': ['New York', 'Paris', 'Berlin', 'London']
}

df = pd.DataFrame(data)

# Select rows with index 0 and 2
print(df.loc[[0, 2]])

Output:

Pandas DataFrame loc Multiple Conditions

In this example, we created a DataFrame from a dictionary. Then we used loc to select rows with index 0 and 2.

Using loc with a Single Condition

We can also use loc with a single condition. Here is an example:

import pandas as pd

data = {
    'name': ['John', 'Anna', 'Peter', 'Linda'],
    'age': [28, 24, 35, 32],
    'city': ['New York', 'Paris', 'Berlin', 'London']
}

df = pd.DataFrame(data)

# Select rows where age is greater than 30
print(df.loc[df['age'] > 30])

Output:

Pandas DataFrame loc Multiple Conditions

In this example, we used loc to select rows where the age is greater than 30.

Using loc with Multiple Conditions

Now let’s see how to use loc with multiple conditions. We can do this by combining conditions with the & (and) or | (or) operators. Here are some examples:

import pandas as pd

data = {
    'name': ['John', 'Anna', 'Peter', 'Linda'],
    'age': [28, 24, 35, 32],
    'city': ['New York', 'Paris', 'Berlin', 'London']
}

df = pd.DataFrame(data)

# Select rows where age is greater than 30 and city is 'Berlin'
print(df.loc[(df['age'] > 30) & (df['city'] == 'Berlin')])

# Select rows where age is greater than 30 or city is 'Berlin'
print(df.loc[(df['age'] > 30) | (df['city'] == 'Berlin')])

Output:

Pandas DataFrame loc Multiple Conditions

In the first example, we used loc to select rows where the age is greater than 30 and the city is ‘Berlin’. In the second example, we used loc to select rows where the age is greater than 30 or the city is ‘Berlin’.

Using loc with Multiple Conditions and Specific Columns

We can also use loc with multiple conditions and select specific columns. Here is an example:

import pandas as pd

data = {
    'name': ['John', 'Anna', 'Peter', 'Linda'],
    'age': [28, 24, 35, 32],
    'city': ['New York', 'Paris', 'Berlin', 'London']
}

df = pd.DataFrame(data)

# Select name and city columns for rows where age is greater than 30 and city is 'Berlin'
print(df.loc[(df['age'] > 30) & (df['city'] == 'Berlin'), ['name', 'city']])

Output:

Pandas DataFrame loc Multiple Conditions

In this example, we used loc to select the name and city columns for rows where the age is greater than 30 and the city is ‘Berlin’.

Using loc with Multiple Conditions and a Function

We can also use loc with multiple conditions and a function. Here is an example:

import pandas as pd

data = {
    'name': ['John', 'Anna', 'Peter', 'Linda'],
    'age': [28, 24, 35, 32],
    'city': ['New York', 'Paris', 'Berlin', 'London']
}

df = pd.DataFrame(data)

# Define a function to check if the city is 'Berlin'
def is_berlin(city):
    return city == 'Berlin'

# Select rows where age is greater than 30 and city is 'Berlin'
print(df.loc[(df['age'] > 30) & df['city'].apply(is_berlin)])

Output:

Pandas DataFrame loc Multiple Conditions

In this example, we defined a function to check if the city is ‘Berlin’. Then we used loc to select rows where the age is greater than 30 and the city is ‘Berlin’.

Pandas DataFrame loc Multiple Conditions Conclusion

In this article, we learned how to use the loc function in pandas DataFrame with multiple conditions. We saw how to combine conditions with the & (and) or | (or) operators, how to select specific columns, how to use slicing, and how to use a function. We also saw some examples of using loc with a single condition and its basic usage.

Remember that when using multiple conditions with loc, each condition must be surrounded by parentheses. Also, when using the & (and) or | (or) operators, make sure to use them with the correct precedence.