Pandas Drop

Pandas Drop

Pandas is a powerful and flexible open-source data analysis and manipulation tool built on top of the Python programming language. One of the common tasks in data analysis is to remove unwanted data from a dataset. The drop function in Pandas is used to drop specified labels from rows or columns. This article will delve into the various ways you can use the drop function in Pandas, providing detailed explanations and examples for each case.

Introduction to Pandas Drop

The drop function is used to remove rows or columns from a DataFrame. It can be used to drop specific rows by index or columns by name. The drop function has the following basic syntax:

DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

Parameters:

  • labels: Single label or list-like. Index or column labels to drop.
  • axis: {0 or ‘index’, 1 or ‘columns’}, default 0. Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).
  • index: Single label or list-like. Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).
  • columns: Single label or list-like. Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).
  • level: For MultiIndex, level from which the labels will be removed.
  • inplace: bool, default False. If True, do operation inplace and return None.
  • errors: {‘ignore’, ‘raise’}, default ‘raise’. If ‘ignore’, suppress error and only existing labels are dropped.

Dropping Columns by Name

One of the most common uses of the drop function is to remove columns from a DataFrame by specifying their names.

Example 1: Dropping a Single Column

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# Drop column 'B'
df = df.drop('B', axis=1)

print(df)

Output:

Pandas Drop

In this example, the column ‘B’ is dropped from the DataFrame. The axis=1 parameter indicates that we are dropping a column.

Explanation:

  • import pandas as pd: Import the Pandas library.
  • df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’.
  • df.drop(‘B’, axis=1): Drop the column ‘B’ from the DataFrame.
  • print(df): Print the resulting DataFrame to the console.

Example 2: Dropping Multiple Columns

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9],
    'D': [10, 11, 12]
})

# Drop columns 'B' and 'C'
df = df.drop(['B', 'C'], axis=1)

print(df)

Output:

Pandas Drop

In this example, columns ‘B’ and ‘C’ are dropped from the DataFrame.

Explanation:

  • df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, ‘C’, and ‘D’.
  • df.drop([‘B’, ‘C’], axis=1): Drop the columns ‘B’ and ‘C’ from the DataFrame.
  • print(df): Print the resulting DataFrame to the console.

Dropping Rows by Index

The drop function can also be used to remove rows from a DataFrame by specifying their index.

Example 3: Dropping a Single Row

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# Drop row with index 1
df = df.drop(1, axis=0)

print(df)

Output:

Pandas Drop

In this example, the row with index 1 is dropped from the DataFrame. The axis=0 parameter indicates that we are dropping a row.

Explanation:

  • df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’.
  • df.drop(1, axis=0): Drop the row with index 1 from the DataFrame.
  • print(df): Print the resulting DataFrame to the console.

Example 4: Dropping Multiple Rows

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3, 4],
    'B': [5, 6, 7, 8],
    'C': [9, 10, 11, 12]
})

# Drop rows with indices 1 and 3
df = df.drop([1, 3], axis=0)

print(df)

Output:

Pandas Drop

In this example, rows with indices 1 and 3 are dropped from the DataFrame.

Explanation:

  • df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’.
  • df.drop([1, 3], axis=0): Drop the rows with indices 1 and 3 from the DataFrame.
  • print(df): Print the resulting DataFrame to the console.

Dropping Rows or Columns Using the index or columns Parameters

Instead of using the labels and axis parameters, you can use the index or columns parameters to drop rows or columns.

Example 5: Dropping Columns Using the columns Parameter

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# Drop column 'B' using the columns parameter
df = df.drop(columns=['B'])

print(df)

Output:

Pandas Drop

In this example, the column ‘B’ is dropped using the columns parameter.

Explanation:

  • df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’.
  • df.drop(columns=[‘B’]): Drop the column ‘B’ using the columns parameter.
  • print(df): Print the resulting DataFrame to the console.

Example 6: Dropping Rows Using the index Parameter

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# Drop row with index 1 using the index parameter
df = df.drop(index=[1])

print(df)

Output:

Pandas Drop

In this example, the row with index 1 is dropped using the index parameter.

Explanation:

  • df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’.
  • df.drop(index=[1]): Drop the row with index 1 using the index parameter.
  • print(df): Print the resulting DataFrame to the console.

Dropping with the inplace Parameter

By default, the drop function returns a new DataFrame with the specified labels dropped. If you want to modify the original DataFrame without creating a new one, you can use the inplace parameter.

Example 7: Dropping a Column Inplace

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# Drop column 'B' inplace
df.drop('B', axis=1, inplace=True)

print(df)

Output:

Pandas Drop

In this example, the column ‘B’ is dropped from the DataFrame in place.

Explanation:

  • df.drop(‘B’, axis=1, inplace=True): Drop the column ‘B’ from the DataFrame in place, modifying the original DataFrame.
  • print(df): Print the resulting DataFrame to the console.

Example 8: Dropping a Row Inplace

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# Drop row with index 1 inplace
df.drop(1, axis=0, inplace=True)

print(df)

Output:

Pandas Drop

In this example, the row with index 1 is dropped from the DataFrame in place.

Explanation:

  • df.drop(1, axis=0, inplace=True): Drop the row with index 1 from the DataFrame in place, modifying the original DataFrame.
  • print(df): Print the resulting DataFrame to the console.

Dropping Labels with Errors Handling

The drop function has an errors parameter which can be used to handle cases where the labels to be dropped do not exist.

Example 9: Dropping a Non-Existent Column with Error Handling

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# Attempt to drop a non-existent column 'D' with error handling
df = df.drop('D', axis=1, errors='ignore')

print(df)

Output:

Pandas Drop

In this example, an attempt is made to drop a non-existent column ‘D’. The errors='ignore' parameter suppresses the error that would normally be raised.

Explanation:

  • df.drop(‘D’, axis=1, errors=’ignore’): Attempt to drop the non-existent column ‘D’ and suppress any error that would be raised.
  • print(df): Print the resulting DataFrame to the console.

Example 10: Dropping a Non-Existent Row with Error Handling

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]
})

# Attempt to drop a non-existent row with index 4 with error handling
df = df.drop(4, axis=0, errors='ignore')

print(df)

Output:

Pandas Drop

In this example, an attempt is made to drop a non-existent row with index 4. The errors='ignore' parameter suppresses the error that would normally be raised.

Explanation:

  • df.drop(4, axis=0, errors=’ignore’): Attempt to drop the non-existent row with index 4 and suppress any error that would be raised.
  • print(df): Print the resulting DataFrame to the console.

Dropping Levels in MultiIndex DataFrames

Pandas also allows for dropping labels from MultiIndex DataFrames by specifying the level from which the labels should be dropped.

Example 11: Dropping a Level from a MultiIndex DataFrame

import pandas as pd

# Create a sample MultiIndex DataFrame
index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1), ('B', 2)])
df = pd.DataFrame({
    'C': [3, 4, 5, 6],
    'D': [7, 8, 9, 10]
}, index=index)

# Drop level 0 ('A', 'B')
df = df.drop('A', level=0)

print(df)

Output:

Pandas Drop

In this example, the labels ‘A’ from level 0 are dropped from the MultiIndex DataFrame.

Explanation:

  • index = pd.MultiIndex.from_tuples: Create a MultiIndex for the DataFrame.
  • df = pd.DataFrame: Create a sample MultiIndex DataFrame with columns ‘C’ and ‘D’.
  • df.drop(‘A’, level=0): Drop the labels ‘A’ from level 0.
  • print(df): Print the resulting DataFrame to the console.

Dropping Duplicates in DataFrames

The drop function can also be used in combination with other functions to drop duplicate rows from a DataFrame.

Example 12: Dropping Duplicate Rows

import pandas as pd

# Create a sample DataFrame with duplicate rows
df = pd.DataFrame({
    'A': [1, 2, 2, 3],
    'B': [4, 5, 5, 6],
    'C': [7, 8, 8, 9]
})

# Drop duplicate rows
df = df.drop_duplicates()

print(df)

Output:

Pandas Drop

In this example, duplicate rows are dropped from the DataFrame.

Explanation:

  • df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’ and some duplicate rows.
  • df.drop_duplicates(): Drop duplicate rows from the DataFrame.
  • print(df): Print the resulting DataFrame to the console.

Example 13: Dropping Duplicate Rows Based on Specific Columns

import pandas as pd

# Create a sample DataFrame with duplicate rows
df = pd.DataFrame({
    'A': [1, 2, 2, 3],
    'B': [4, 5, 5, 6],
    'C': [7, 8, 8, 9]
})

# Drop duplicate rows based on columns 'A' and 'B'
df = df.drop_duplicates(subset=['A', 'B'])

print(df)

Output:

Pandas Drop

In this example, duplicate rows are dropped based on the values in columns ‘A’ and ‘B’.

Explanation:

  • df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’ and some duplicate rows.
  • df.drop_duplicates(subset=[‘A’, ‘B’]): Drop duplicate rows based on the values in columns ‘A’ and ‘B’.
  • print(df): Print the resulting DataFrame to the console.

Dropping NaN Values

The dropna function is used to drop rows or columns with missing values (NaN).

Example 14: Dropping Rows with NaN Values

import pandas as pd
import numpy as np

# Create a sample DataFrame with NaN values
df = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [5, np.nan, 7, 8],
    'C': [9, 10, 11, np.nan]
})

# Drop rows with NaN values
df = df.dropna()

print(df)

Output:

Pandas Drop

In this example, rows with NaN values are dropped from the DataFrame.

Explanation:

  • import numpy as np: Import the NumPy library for handling NaN values.
  • df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’ and some NaN values.
  • df.dropna(): Drop rows with NaN values from the DataFrame.
  • print(df): Print the resulting DataFrame to the console.

Example 15: Dropping Columns with NaN Values

import pandas as pd
import numpy as np

# Create a sample DataFrame with NaN values
df = pd.DataFrame({
    'A': [1, 2, np.nan, 4],
    'B': [5, np.nan, 7, 8],
    'C': [9, 10, 11, np.nan]
})

# Drop columns with NaN values
df = df.dropna(axis=1)

print(df)

Output:

Pandas Drop

In this example, columns with NaN values are dropped from the DataFrame.

Explanation:

  • df.dropna(axis=1): Drop columns with NaN values from the DataFrame.
  • print(df): Print the resulting DataFrame to the console.

Pandas Drop Conclusion

The drop function in Pandas is a versatile tool for data manipulation, allowing you to remove unwanted rows or columns from your DataFrame. Whether you’re dropping specific labels, handling errors, working with MultiIndex DataFrames, or dealing with duplicates and missing values, the drop function provides a straightforward and efficient way to clean and manage your data.

By understanding the various parameters and options available with the drop function, you can perform a wide range of data cleaning tasks, ensuring your DataFrame is ready for analysis. The detailed examples provided in this article illustrate the different ways you can use the drop function, making it easier for you to apply these techniques to your own datasets. DataFrame to the console.