Pandas loc Condition

Pandas loc Condition

Pandas is a powerful data manipulation library in Python, widely used in data analysis and machine learning tasks. One of its core functionalities is the ability to perform complex data selections using conditions, particularly through the loc attribute. This article will explore various ways to use the loc method in Pandas to filter data based on conditions. We will cover a range of examples that demonstrate how to use conditions effectively with loc to select and manipulate data in a DataFrame.

Introduction to Pandas loc

The loc attribute is used to access a group of rows and columns by labels or a boolean array. loc primarily works with label-based indexing, which means that you have to specify the names of the rows and columns that you need to filter. However, it can also work with a boolean array that indicates which rows are included in the output.

Before diving into the examples, let’s first set up a basic Pandas DataFrame that we will use throughout this article:

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
}
df = pd.DataFrame(data)

print(df)

Output:

Pandas loc Condition

Basic Usage of loc

Selecting Rows by Condition

The simplest form of condition is selecting rows based on the value of a column. Here’s how you can select rows where the age is greater than 30:

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
}
df = pd.DataFrame(data)

# Select rows where age is greater than 30
result = df.loc[df['Age'] > 30]
print(result)

Output:

Pandas loc Condition

Selecting Specific Columns with Condition

You can also specify the columns you want to retrieve along with the condition:

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
}
df = pd.DataFrame(data)

# Select only the Name and Email of persons older than 30
result = df.loc[df['Age'] > 30, ['Name', 'Email']]
print(result)

Output:

Pandas loc Condition

Advanced Conditional Selections

Using AND (&) Condition

You can combine multiple conditions using the & operator:

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
}
df = pd.DataFrame(data)

# Select rows where age is greater than 30 and name starts with 'C'
result = df.loc[(df['Age'] > 30) & (df['Name'].str.startswith('C'))]
print(result)

Output:

Pandas loc Condition

Using OR (|) Condition

Similarly, use the | operator to combine conditions with OR logic:

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
}
df = pd.DataFrame(data)

# Select rows where age is less than 35 or name starts with 'D'
result = df.loc[(df['Age'] < 35) | (df['Name'].str.startswith('D'))]
print(result)

Output:

Pandas loc Condition

Using NOT (~) Condition

To select rows that do not match a condition, use the ~ operator:

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
}
df = pd.DataFrame(data)

# Select rows where name does not start with 'A'
result = df.loc[~(df['Name'].str.startswith('A'))]
print(result)

Output:

Pandas loc Condition

Complex Conditions

Using isin Method

The isin method is useful for filtering data based on a list of values:

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
}
df = pd.DataFrame(data)

# Select rows where name is either 'Alice' or 'Bob'
result = df.loc[df['Name'].isin(['Alice', 'Bob'])]
print(result)

Output:

Pandas loc Condition

Using between Method

The between method is handy for selecting rows where column values fall within a range:

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
}
df = pd.DataFrame(data)

# Select rows where age is between 30 and 40
result = df.loc[df['Age'].between(30, 40)]
print(result)

Output:

Pandas loc Condition

Combining Conditions Across Different Columns

You can also combine conditions that involve multiple columns:

import pandas as pd

# Sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'Email': ['[email protected]', '[email protected]', '[email protected]', '[email protected]', '[email protected]']
}
df = pd.DataFrame(data)

# Select rows where age is greater than 30 and email includes 'pandasdataframe.com'
result = df.loc[(df['Age'] > 30) & (df['Email'].str.contains('pandasdataframe.com'))]
print(result)

Output:

Pandas loc Condition

Pandas loc Condition conclusion

Using the loc method with conditions in Pandas provides a robust way to filter and manipulate DataFrame rows based on complex logic. This functionality is essential for data preprocessing, analysis, and feature engineering in Python data science projects. The examples provided here should help you get started with using loc effectively in your own data analysis tasks.

By mastering these techniques, you can efficiently explore and manipulate large datasets, allowing you to extract meaningful insights from your data.