Pandas loc Function
The loc
function in pandas is a powerful tool for selecting data from a DataFrame based on label information. It is one of the most commonly used functions for data manipulation and querying in the pandas library. This article will explore the loc
function in detail, providing a comprehensive guide on its usage with various examples.
Introduction to Pandas loc
Pandas is a Python library used for data manipulation and analysis. It provides several functions to perform complex data operations with ease. One of these functions is loc
, which is used for label-based indexing. This function allows users to access a group of rows and columns by labels or a boolean array.
Basic Syntax of loc
The basic syntax of the loc
function is:
dataframe.loc[row_labels, column_labels]
Here, row_labels
and column_labels
can be labels, lists of labels, a slice object with labels, or a boolean array.
Examples of Using loc
Example 1: Selecting a Single Row by Label
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[0]
print(result)
Output:
Example 2: Selecting Multiple Rows by Labels
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[[0, 2]]
print(result)
Output:
Example 3: Selecting Slices of Rows
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[0:1]
print(result)
Output:
Example 4: Selecting Columns by Label
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[:, 'Website']
print(result)
Output:
Example 5: Selecting Multiple Columns by Labels
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[:, ['Website', 'Visits']]
print(result)
Output:
Example 6: Selecting Rows and Columns
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[0, 'Website']
print(result)
Output:
Example 7: Selecting Rows by Condition
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[df['Visits'] > 1000]
print(result)
Output:
Example 8: Using Boolean Arrays
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[df['Website'].str.contains('pandasdataframe.com')]
print(result)
Output:
Example 9: Updating Data Using loc
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
df.loc[0, 'Visits'] = 1200
print(df)
Output:
Example 10: Adding a New Column Using loc
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
df.loc[:, 'NewColumn'] = 'Sample'
print(df)
Output:
Advanced Usage of loc
Example 11: Using loc
with MultiIndex DataFrames
import pandas as pd
tuples = [('pandasdataframe.com', '2021'), ('pandasdataframe.com', '2022'), ('example.com', '2021')]
index = pd.MultiIndex.from_tuples(tuples, names=['Website', 'Year'])
data = {
'Visits': [1000, 1200, 1500]
}
df = pd.DataFrame(data, index=index)
result = df.loc[('pandasdataframe.com', '2021')]
print(result)
Output:
Example 12: Conditional Selection with loc
and Functions
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[df['Visits'].apply(lambda x: x > 1000)]
print(result)
Output:
Example 13: Selecting Rows Using Index Data
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data, index=['A', 'B', 'C'])
result = df.loc['A']
print(result)
Output:
Example 14: Complex Conditions
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[(df['Visits'] > 800) & (df['Website'].str.contains('com'))]
print(result)
Output:
Example 15: Using loc
to Select Rows and Columns Simultaneously
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[df['Visits'] > 1000, ['Website']]
print(result)
Output:
Example 16: Modifying a Slice of the DataFrame
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
df.loc[df['Visits'] > 1000, 'Visits'] = 2000
print(df)
Output:
Example 17: Using loc
with a Callable
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[lambda df: df['Visits'] > 1000, ['Website']]
print(result)
Output:
Example 18: Selecting Data with a List of Labels
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
result = df.loc[[0, 2], 'Website']
print(result)
Output:
Example 19: Usingloc
to Update Multiple Columns
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800],
'Revenue': [200, 300, 150]
}
df = pd.DataFrame(data)
df.loc[df['Visits'] > 800, ['Visits', 'Revenue']] = [1600, 400]
print(df)
Output:
Example 20: Inserting a New Row Using loc
import pandas as pd
data = {
'Website': ['pandasdataframe.com', 'example.com', 'testsite.com'],
'Visits': [1000, 1500, 800]
}
df = pd.DataFrame(data)
df.loc[3] = ['newsite.com', 500]
print(df)
Output:
Pandas loc Function Conclusion
The loc
function in pandas is a versatile tool that can be used for a wide range of data selection and manipulation tasks. It allows for label-based indexing, which is intuitive and useful especially when dealing with complex data sets. The examples provided in this article demonstrate the flexibility of loc
in accessing and modifying data within a DataFrame. Whether you are filtering data, updating values, or adding new rows and columns, loc
provides a robust way to perform these operations efficiently.
Understanding and mastering the use of loc
can significantly enhance your data manipulation capabilities in Python using pandas. It is recommended to practice these examples and experiment with different variations to fully grasp the potential of the loc
function.