Pandas DataFrame loc Column

Pandas DataFrame loc Column

Pandas is a powerful data manipulation library in Python. It provides data structures and functions needed to manipulate structured data. One of the most commonly used data structures in pandas is DataFrame. DataFrame is a two-dimensional labeled data structure with columns of potentially different types.

One of the most powerful features of pandas is its indexing functionality. Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns.

In this article, we will focus on one of the indexing functions in pandas, the loc function. The loc function is a label-based data selection method which means that we have to pass the name of the row or column which we want to select. This method includes the last element of the range passed in it, unlike iloc.

1. Selecting a Single Column with loc

Here is an example of how to select a single column with loc:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)

print(df.loc[:, 'Name'])

Output:

Pandas DataFrame loc Column

In this example, we are selecting all rows (indicated by the colon) and the column ‘Name’.

2. Selecting Multiple Columns with loc

We can also select multiple columns with loc. Here is an example:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)

print(df.loc[:, ['Name', 'City']])

Output:

Pandas DataFrame loc Column

In this example, we are selecting all rows and the columns ‘Name’ and ‘City’.

3. Selecting Rows and Columns with loc

We can also select both rows and columns with loc. Here is an example:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)

print(df.loc[1:3, ['Name', 'City']])

Output:

Pandas DataFrame loc Column

In this example, we are selecting the rows with index 1 to 3 (inclusive) and the columns ‘Name’ and ‘City’.

4. Selecting a Range of Columns with loc

We can also select a range of columns with loc. Here is an example:

import pandas as pd

data = {'A': [1, 2, 3, 4],
        'B': [5, 6, 7, 8],
        'C': [9, 10, 11, 12],
        'D': [13, 14, 15, 16]}
df = pd.DataFrame(data)

print(df.loc[:, 'A':'C'])

Output:

Pandas DataFrame loc Column

In this example, we are selecting all rows and the columns from ‘A’ to ‘C’ (inclusive).

5. Selecting with Conditions with loc

We can also select rows and columns based on conditions with loc. Here is an example:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)

print(df.loc[df['Age'] > 30, ['Name', 'City']])

Output:

Pandas DataFrame loc Column

In this example, we are selecting the rows where the age is greater than 30 and the columns ‘Name’ and ‘City’.

6. Modifying Data with loc

We can also modify data in a DataFrame with loc. Here is an example:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)

df.loc[df['Age'] > 30, 'City'] = 'San Francisco'

print(df)

Output:

Pandas DataFrame loc Column

In this example, we are changing the city to ‘San Francisco’ for all rows where the age is greater than 30.

7. Adding Data with loc

We can also add data to a DataFrame with loc. Here is an example:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)

df.loc[4] = ['Paul', 40, 'Tokyo']

print(df)

Output:

Pandas DataFrame loc Column

In this example, we are adding a new row with index 4 to the DataFrame.

8. Selecting with Boolean Arrays with loc

We can also select data with boolean arrays with loc. Here is an example:

import pandas as pd

data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 24, 35, 32],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)

print(df.loc[df['Age'] > 30])

Output:

Pandas DataFrame loc Column

In this example, we are selecting all rows where the age is greater than 30.

9. Selecting with MultiIndex with loc

If we have a DataFrame with MultiIndex, we can also select data with loc. Here is an example:

import pandas as pd

index = pd.MultiIndex.from_tuples([(i, j) for i in range(5) for j in range(5)])
df = pd.DataFrame({'A': range(25), 'B': range(25, 50)}, index=index)

print(df.loc[(1, 3):])

Output:

Pandas DataFrame loc Column

In this example, we are selecting all rows where the index is greater than or equal to (1, 3).

10. Selecting with a Callable with loc

We can also select data with a callable with loc. Here is an example:

import pandas as pd

df = pd.DataFrame({'A': range(10), 'B': range(10, 20)})

print(df.loc[lambda df: df['A'] > 5])

Output:

Pandas DataFrame loc Column

In this example, we are selecting all rows where the value in column ‘A’ is greater than 5.

In conclusion, the loc function in pandas is a versatile and powerful tool for data selection and manipulation. It allows us to select and modify data in a DataFrame in a variety of ways, making it an essential tool for any data scientist or analyst.