Pandas DataFrame Loc Condition

Pandas DataFrame Loc Condition

Pandas is a powerful data manipulation library in Python that provides data structures and functions for effectively handling and analyzing large datasets. One of the key functionalities provided by Pandas is the ability to select and manipulate data based on conditions using the loc attribute. This article will explore various ways to use the loc attribute with conditions in Pandas DataFrames, providing detailed examples to illustrate different use cases.

Introduction to Pandas DataFrame

A DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). It is one of the most commonly used Pandas objects for data manipulation and analysis.

Before diving into the examples, let’s first import the Pandas library and create a sample DataFrame to work with:

import pandas as pd

# Sample DataFrame
data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
    'Visits': [1000, 1500, 800],
    'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
print(df)

Output:

Pandas DataFrame Loc Condition

Basic Usage of loc

The loc attribute allows for label-based indexing, which means you can select rows and columns based on their labels. The basic syntax is:

import pandas as pd

# Sample DataFrame
data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
    'Visits': [1000, 1500, 800],
    'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)

df.loc[row_labels, column_labels]
print(df)

Example 1: Selecting Rows by Condition

To select rows where the number of visits is greater than 900, you can use:

import pandas as pd

# Sample DataFrame
data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
    'Visits': [1000, 1500, 800],
    'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)

result = df.loc[df['Visits'] > 900, :]
print(result)

Output:

Pandas DataFrame Loc Condition

Example 2: Selecting Specific Columns by Condition

If you only want to see the ‘Website’ column for rows where ‘Visits’ is greater than 900:

import pandas as pd

# Sample DataFrame
data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
    'Visits': [1000, 1500, 800],
    'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)

result = df.loc[df['Visits'] > 900, 'Website']
print(result)

Output:

Pandas DataFrame Loc Condition

Example 3: Combining Conditions

You can combine multiple conditions using the & (and) and | (or) operators. Here’s how to select rows where visits are more than 900 and revenue is more than 250:

import pandas as pd

# Sample DataFrame
data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
    'Visits': [1000, 1500, 800],
    'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)

result = df.loc[(df['Visits'] > 900) & (df['Revenue'] > 250), :]
print(result)

Output:

Pandas DataFrame Loc Condition

Example 4: Using isin with loc

The isin method can be used to filter data based on a list of values. Here’s how to select rows where the website is either ‘pandasdataframe.com’ or ‘example.com’:

import pandas as pd

# Sample DataFrame
data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
    'Visits': [1000, 1500, 800],
    'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)

websites = ['pandasdataframe.com', 'example.com']
result = df.loc[df['Website'].isin(websites), :]
print(result)

Output:

Pandas DataFrame Loc Condition

Example 5: Using str Methods with loc

Pandas offers powerful string methods that can be used with loc. For example, to select rows where the website ends with ‘.com’:

import pandas as pd

# Sample DataFrame
data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
    'Visits': [1000, 1500, 800],
    'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)

result = df.loc[df['Website'].str.endswith('.com'), :]
print(result)

Output:

Pandas DataFrame Loc Condition

Example 6: Updating Data Based on a Condition

You can also update values in a DataFrame based on a condition. Here’s how to increase the visits by 10% for websites with more than 1000 visits:

import pandas as pd

# Sample DataFrame
data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
    'Visits': [1000, 1500, 800],
    'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)

df.loc[df['Visits'] > 1000, 'Visits'] *= 1.10
print(df)

Example 7: Selecting Rows with Null/NaN Values

To select rows where a certain column has null (or NaN) values, you can use:

import pandas as pd

# Sample DataFrame
data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
    'Visits': [1000, 1500, 800],
    'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)

# Adding a row with NaN values for demonstration
df.loc[3] = ['newsite.com', pd.NA, pd.NA]

result = df.loc[df['Revenue'].isna(), :]
print(result)

Output:

Pandas DataFrame Loc Condition

Example 8: Excluding Certain Rows

To exclude rows that meet a certain condition, you can use the ~ operator, which acts as a negation:

import pandas as pd

# Sample DataFrame
data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
    'Visits': [1000, 1500, 800],
    'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)

result = df.loc[~(df['Website'] == 'pandasdataframe.com'), :]
print(result)

Output:

Pandas DataFrame Loc Condition

Example 9: Using query Method with loc

Pandas also provides a query method that can be used to select rows based on a query string:

import pandas as pd

# Sample DataFrame
data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
    'Visits': [1000, 1500, 800],
    'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)

result = df.loc[df.query('Visits > 900').index, :]
print(result)

Output:

Pandas DataFrame Loc Condition

Example 10: Selecting Rows with Multiple Conditions on Different Columns

Here’s how to select rows where the number of visits is more than 900 and the revenue is less than 300:

import pandas as pd

# Sample DataFrame
data = {
    'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
    'Visits': [1000, 1500, 800],
    'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)

result = df.loc[(df['Visits'] > 900) & (df['Revenue'] < 300), :]
print(result)

Output:

Pandas DataFrame Loc Condition

Pandas DataFrame Loc Condition Conclusion

The loc attribute in Pandas is a versatile tool for selecting and manipulating data based on conditions. It allows for label-based indexing and can be combined with various methods to perform complex data filtering and manipulation tasks efficiently. By mastering the use of loc with conditions, you can significantly enhance your data analysis capabilities in Python using Pandas.