Pandas DataFrame loc Multiple Conditions
Pandas is a powerful data manipulation library in Python. It provides data structures and functions needed to manipulate structured data. One of the most commonly used data structures in pandas is DataFrame. DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL table, or a dictionary of Series objects.
In this article, we will focus on the loc
function in pandas DataFrame, specifically on how to use it with multiple conditions. The loc
function is a label-based data selection method which means that we have to pass the name of the row or column which we want to select. This method includes the last element of the range as well, unlike iloc
which is an integer index-based method.
Basic Usage of loc
Before we dive into using loc
with multiple conditions, let’s first understand its basic usage. Here is an example:
import pandas as pd
data = {
'name': ['John', 'Anna', 'Peter', 'Linda'],
'age': [28, 24, 35, 32],
'city': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)
# Select rows with index 0 and 2
print(df.loc[[0, 2]])
Output:
In this example, we created a DataFrame from a dictionary. Then we used loc
to select rows with index 0 and 2.
Using loc with a Single Condition
We can also use loc
with a single condition. Here is an example:
import pandas as pd
data = {
'name': ['John', 'Anna', 'Peter', 'Linda'],
'age': [28, 24, 35, 32],
'city': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)
# Select rows where age is greater than 30
print(df.loc[df['age'] > 30])
Output:
In this example, we used loc
to select rows where the age is greater than 30.
Using loc with Multiple Conditions
Now let’s see how to use loc
with multiple conditions. We can do this by combining conditions with the &
(and) or |
(or) operators. Here are some examples:
import pandas as pd
data = {
'name': ['John', 'Anna', 'Peter', 'Linda'],
'age': [28, 24, 35, 32],
'city': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)
# Select rows where age is greater than 30 and city is 'Berlin'
print(df.loc[(df['age'] > 30) & (df['city'] == 'Berlin')])
# Select rows where age is greater than 30 or city is 'Berlin'
print(df.loc[(df['age'] > 30) | (df['city'] == 'Berlin')])
Output:
In the first example, we used loc
to select rows where the age is greater than 30 and the city is ‘Berlin’. In the second example, we used loc
to select rows where the age is greater than 30 or the city is ‘Berlin’.
Using loc with Multiple Conditions and Specific Columns
We can also use loc
with multiple conditions and select specific columns. Here is an example:
import pandas as pd
data = {
'name': ['John', 'Anna', 'Peter', 'Linda'],
'age': [28, 24, 35, 32],
'city': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)
# Select name and city columns for rows where age is greater than 30 and city is 'Berlin'
print(df.loc[(df['age'] > 30) & (df['city'] == 'Berlin'), ['name', 'city']])
Output:
In this example, we used loc
to select the name and city columns for rows where the age is greater than 30 and the city is ‘Berlin’.
Using loc with Multiple Conditions and a Function
We can also use loc
with multiple conditions and a function. Here is an example:
import pandas as pd
data = {
'name': ['John', 'Anna', 'Peter', 'Linda'],
'age': [28, 24, 35, 32],
'city': ['New York', 'Paris', 'Berlin', 'London']
}
df = pd.DataFrame(data)
# Define a function to check if the city is 'Berlin'
def is_berlin(city):
return city == 'Berlin'
# Select rows where age is greater than 30 and city is 'Berlin'
print(df.loc[(df['age'] > 30) & df['city'].apply(is_berlin)])
Output:
In this example, we defined a function to check if the city is ‘Berlin’. Then we used loc
to select rows where the age is greater than 30 and the city is ‘Berlin’.
Pandas DataFrame loc Multiple Conditions Conclusion
In this article, we learned how to use the loc
function in pandas DataFrame with multiple conditions. We saw how to combine conditions with the &
(and) or |
(or) operators, how to select specific columns, how to use slicing, and how to use a function. We also saw some examples of using loc
with a single condition and its basic usage.
Remember that when using multiple conditions with loc
, each condition must be surrounded by parentheses. Also, when using the &
(and) or |
(or) operators, make sure to use them with the correct precedence.