Pandas DataFrame Loc Condition
Pandas is a powerful data manipulation library in Python that provides data structures and functions for effectively handling and analyzing large datasets. One of the key functionalities provided by Pandas is the ability to select and manipulate data based on conditions using the loc
attribute. This article will explore various ways to use the loc
attribute with conditions in Pandas DataFrames, providing detailed examples to illustrate different use cases.
Introduction to Pandas DataFrame
A DataFrame is a two-dimensional, size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). It is one of the most commonly used Pandas objects for data manipulation and analysis.
Before diving into the examples, let’s first import the Pandas library and create a sample DataFrame to work with:
import pandas as pd
# Sample DataFrame
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
print(df)
Output:
Basic Usage of loc
The loc
attribute allows for label-based indexing, which means you can select rows and columns based on their labels. The basic syntax is:
import pandas as pd
# Sample DataFrame
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
df.loc[row_labels, column_labels]
print(df)
Example 1: Selecting Rows by Condition
To select rows where the number of visits is greater than 900, you can use:
import pandas as pd
# Sample DataFrame
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
result = df.loc[df['Visits'] > 900, :]
print(result)
Output:
Example 2: Selecting Specific Columns by Condition
If you only want to see the ‘Website’ column for rows where ‘Visits’ is greater than 900:
import pandas as pd
# Sample DataFrame
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
result = df.loc[df['Visits'] > 900, 'Website']
print(result)
Output:
Example 3: Combining Conditions
You can combine multiple conditions using the &
(and) and |
(or) operators. Here’s how to select rows where visits are more than 900 and revenue is more than 250:
import pandas as pd
# Sample DataFrame
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
result = df.loc[(df['Visits'] > 900) & (df['Revenue'] > 250), :]
print(result)
Output:
Example 4: Using isin
with loc
The isin
method can be used to filter data based on a list of values. Here’s how to select rows where the website is either ‘pandasdataframe.com’ or ‘example.com’:
import pandas as pd
# Sample DataFrame
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
websites = ['pandasdataframe.com', 'example.com']
result = df.loc[df['Website'].isin(websites), :]
print(result)
Output:
Example 5: Using str
Methods with loc
Pandas offers powerful string methods that can be used with loc
. For example, to select rows where the website ends with ‘.com’:
import pandas as pd
# Sample DataFrame
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
result = df.loc[df['Website'].str.endswith('.com'), :]
print(result)
Output:
Example 6: Updating Data Based on a Condition
You can also update values in a DataFrame based on a condition. Here’s how to increase the visits by 10% for websites with more than 1000 visits:
import pandas as pd
# Sample DataFrame
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
df.loc[df['Visits'] > 1000, 'Visits'] *= 1.10
print(df)
Example 7: Selecting Rows with Null/NaN Values
To select rows where a certain column has null (or NaN) values, you can use:
import pandas as pd
# Sample DataFrame
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
# Adding a row with NaN values for demonstration
df.loc[3] = ['newsite.com', pd.NA, pd.NA]
result = df.loc[df['Revenue'].isna(), :]
print(result)
Output:
Example 8: Excluding Certain Rows
To exclude rows that meet a certain condition, you can use the ~
operator, which acts as a negation:
import pandas as pd
# Sample DataFrame
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
result = df.loc[~(df['Website'] == 'pandasdataframe.com'), :]
print(result)
Output:
Example 9: Using query
Method with loc
Pandas also provides a query
method that can be used to select rows based on a query string:
import pandas as pd
# Sample DataFrame
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
result = df.loc[df.query('Visits > 900').index, :]
print(result)
Output:
Example 10: Selecting Rows with Multiple Conditions on Different Columns
Here’s how to select rows where the number of visits is more than 900 and the revenue is less than 300:
import pandas as pd
# Sample DataFrame
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
result = df.loc[(df['Visits'] > 900) & (df['Revenue'] < 300), :]
print(result)
Output:
Pandas DataFrame Loc Condition Conclusion
The loc
attribute in Pandas is a versatile tool for selecting and manipulating data based on conditions. It allows for label-based indexing and can be combined with various methods to perform complex data filtering and manipulation tasks efficiently. By mastering the use of loc
with conditions, you can significantly enhance your data analysis capabilities in Python using Pandas.