Pandas DataFrame loc
Pandas is a powerful data manipulation library in Python that provides data structures and functions for effectively handling and analyzing large datasets. One of the key features of Pandas is the DataFrame, which is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). In this article, we will explore the loc
attribute of the Pandas DataFrame, which is used for accessing a group of rows and columns by labels or a boolean array.
Understanding DataFrame loc
The loc
attribute is part of the indexing capabilities of the DataFrame. It allows for selecting data by label or by a condition that returns a boolean array. The loc
indexer is primarily label based, but it can also be used with a boolean array.
Basic Usage of loc
The basic syntax of loc
is:
dataframe.loc[row_labels, column_labels]
Where row_labels
and column_labels
can be labels, lists of labels, a slice object with labels, or a boolean array.
Example 1: Selecting a single row by index label
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'test.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
result = df.loc['A']
print(result)
Output:
Example 2: Selecting multiple rows by index label
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'test.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
result = df.loc[['A', 'B']]
print(result)
Output:
Example 3: Selecting rows by slice of index labels
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'test.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
result = df.loc['A':'B']
print(result)
Output:
Example 4: Selecting specific rows and columns
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'test.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
result = df.loc[['A', 'C'], 'Website']
print(result)
Output:
Example 5: Selecting rows by boolean array
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'test.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
result = df.loc[df['Visits'] > 900]
print(result)
Output:
Advanced Usage of loc
The loc
attribute can also be used for more advanced data selection techniques, such as conditional selections and setting values.
Example 6: Conditional selection with loc
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'test.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
result = df.loc[df['Visits'] > 900, ['Website']]
print(result)
Output:
Example 7: Setting values in selected rows
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'test.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
df.loc[df['Visits'] > 900, 'Visits'] = 2000
print(df)
Output:
Example 8: Using loc with a callable
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'test.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
result = df.loc[lambda x: x['Visits'] > 900]
print(result)
Output:
Complex Queries with loc
The loc
attribute can be used to perform complex queries by combining multiple conditions.
Example 9: Combining conditions
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'test.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
result = df.loc[(df['Visits'] > 900) & (df['Website'].str.contains('pandasdataframe.com'))]
print(result)
Output:
Example 10: Using loc with isin
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'test.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
result = df.loc[df['Website'].isin(['pandasdataframe.com', 'test.com'])]
print(result)
Output:
Summary
In this article, we explored the loc
attribute of the Pandas DataFrame, which is a powerful tool for data selection based on labels and conditions. We covered basic and advanced usage scenarios, including selecting rows and columns, setting values, and performing complex queries. The examples provided demonstrate the flexibility and utility of the loc
attribute in data manipulation tasks.
By mastering the use of loc
, you can efficiently handle and analyze large datasets, making informed decisions based on complex criteria. Whether you are a data scientist, analyst, or enthusiast, understanding how to effectively use loc
in Pandas is an essential skill in your data manipulation toolkit.