Pandas DataFrame loc and iloc
Pandas is a powerful data manipulation library in Python that offers a variety of methods to slice, dice, and generally manipulate data. Two of the most useful indexing functions in pandas are loc
and iloc
. These functions are used to access a group of rows and columns by labels or a boolean array. loc
is label-based, which means that you have to specify rows and columns based on their row and column labels. iloc
is integer index-based, so you specify rows and columns by their integer index.
This article will provide a comprehensive guide on how to use loc
and iloc
in pandas, including detailed examples to illustrate their usage.
Understanding loc
in Pandas
The loc
attribute allows indexing and slicing that always references the explicit index (the labels of the rows and columns). It can accept:
- A single label
- A list or array of labels
- A slice object with labels
- A boolean array
Example 1: Selecting a single row by index label
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data, index=['a', 'b', 'c'])
result = df.loc['a']
print(result)
Output:
Example 2: Selecting multiple rows by index label
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data, index=['a', 'b', 'c'])
result = df.loc[['a', 'b']]
print(result)
Output:
Example 3: Selecting rows by a slice of index labels
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data, index=['a', 'b', 'c'])
result = df.loc['a':'b']
print(result)
Output:
Example 4: Selecting rows and columns
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data, index=['a', 'b', 'c'])
result = df.loc['a':'b', 'Name']
print(result)
Output:
Example 5: Using boolean arrays
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data, index=['a', 'b', 'c'])
result = df.loc[df['Age'] > 25]
print(result)
Output:
Understanding iloc
in Pandas
The iloc
indexer for Pandas Dataframe is used for integer-location based indexing / selection by position. It can accept:
- An integer
- A list or array of integers
- A slice object with integers
Example 6: Selecting a single row by integer index
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
result = df.iloc[0]
print(result)
Output:
Example 7: Selecting multiple rows by integer index
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
result = df.iloc[[0, 1]]
print(result)
Output:
Example 8: Selecting rows by a slice of integer indices
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
result = df.iloc[0:2]
print(result)
Output:
Example 9: Selecting rows and columns by integer index
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pandasdataframe.com.DataFrame(data)
result = df.iloc[0:2, 1]
print(result)
Example 10: Using boolean arrays with iloc
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
result = df.iloc[(df['Age'] > 25).values]
print(result)
Output:
Practical Examples and Use Cases
Example 11: Filtering rows based on conditions
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
result = df.loc[df['Age'] > 25]
print(result)
Output:
Example 12: Selecting specific rows and columns
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
result = df.loc[df['Age'] > 25, ['Name']]
print(result)
Output:
Example 13: Modifying a subset of a DataFrame
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
df.loc[df['Age'] > 25, 'Age'] = 40
print(df)
Output:
Example 14: Using iloc
to reorder rows
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
result = df.iloc[[2, 1, 0]]
print(result)
Output:
Example 15: Combining loc
and iloc
for complex scenarios
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data, index=['a', 'b', 'c'])
result = df.loc['a':'b'].iloc[:, 1]
print(result)
Output:
Example 16: Using loc
with a callable
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data, index=['a', 'b', 'c'])
result = df.loc[lambda df: df['Age'] > 25]
print(result)
Output:
Example 17: Using iloc
with a callable
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
result = df.iloc[lambda df: [0, 2]]
print(result)
Output:
Example 18: Selecting rows based on custom criteria
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
result = df.loc[df['Name'].str.startswith('A')]
print(result)
Output:
Example 19: Using iloc
to select columns
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
result = df.iloc[:, [1]]
print(result)
Output:
Example 20: Complex boolean indexing with loc
import pandas as pd
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}
df = pd.DataFrame(data)
result = df.loc[(df['Age'] > 25) & (df['Name'].str.contains('o'))]
print(result)
Output:
In conclusion, loc
and iloc
are versatile tools in pandas that allow for efficient data selection and manipulation. By understanding how to use these indexing methods, you can handle a wide range of data processing tasks more effectively.