Pandas DataFrame loc KeyError
Pandas is a powerful data manipulation library in Python. It provides data structures and functions needed to manipulate structured data. One of the most commonly used data structures in pandas is DataFrame. DataFrame is a two-dimensional labeled data structure with columns of potentially different types. It is similar to a spreadsheet or SQL table, or a dictionary of Series objects.
One of the most common operations on DataFrame is to select or filter data based on some conditions. This is where the loc
function comes in. The loc
function is used to access a group of rows and columns by label(s) or a boolean array. However, sometimes when using the loc
function, you may encounter a KeyError. This article will explain why this error occurs and how to handle it.
Understanding KeyError
A KeyError in Python is raised when a dictionary’s key is not found in the set of existing keys. Similarly, in pandas, a KeyError is raised when the specified key is not found in the DataFrame’s index.
For example, consider the following DataFrame:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
print(df)
Output:
If you try to access a row with a label that does not exist in the DataFrame’s index, you will get a KeyError:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
print(df.loc[4])
In the above example, the DataFrame’s index only contains the labels 0, 1, 2, and 3. Therefore, trying to access the row with the label 4 raises a KeyError.
Handling KeyError
There are several ways to handle a KeyError when using the loc
function in pandas.
Check if the Key Exists
Before trying to access a row or column, you can check if the key exists in the DataFrame’s index or columns. Here is an example:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
if '4' in df.index:
print(df.loc[4])
else:
print('Key not found')
Output:
In this example, before trying to access the row with the label 4, we check if the key exists in the DataFrame’s index. If the key exists, we access the row; otherwise, we print a message saying that the key was not found.
Use the get_loc Function
Another way to handle a KeyError is to use the get_loc
function. The get_loc
function returns an integer which represents the position of the label in the index. If the label is not in the index, it raises a KeyError. Here is an example:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
try:
position = df.index.get_loc(4)
print(df.iloc[position])
except KeyError:
print('Key not found')
Output:
In this example, we first try to get the position of the label 4 in the DataFrame’s index. If the label is not in the index, the get_loc
function raises a KeyError, and we print a message saying that the key was not found.
Use the at Function
The at
function is similar to the loc
function, but it only accesses a single value at a time. If the specified label is not in the DataFrame’s index or columns, it raises a KeyError. Here is an example:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
try:
print(df.at[4, 'Name'])
except KeyError:
print('Key not found')
Output:
In this example, we try to access the value in the ‘Name’ column of the row with the label 4. If the label is not in the DataFrame’s index, the at
function raises a KeyError, and we print a message saying that the key was not found.
Use the isin Function
The isin
function returns a boolean DataFrame showing whether each element in the DataFrame is contained in the specified labels. You can use this function to filter the DataFrame before using the loc
function. Here is an example:
import pandas as pd
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
labels = [4]
mask = df.index.isin(labels)
print(df.loc[mask])
Output:
In this example, we first create a boolean mask which is True for the rows with the labels in the specified list and False for the other rows. Then, we use this mask to filter the DataFrame before using the loc
function.
Pandas DataFrame loc KeyError Conclusion
In this article, we have discussed the KeyError that can occur when using the loc
function in pandas. We have explained why this error occurs and how to handle it. We have also provided several examples to illustrate these concepts. By understanding these concepts and examples, you should be able to handle KeyError when using the loc
function in pandas more effectively.