Pandas Drop
Pandas is a powerful and flexible open-source data analysis and manipulation tool built on top of the Python programming language. One of the common tasks in data analysis is to remove unwanted data from a dataset. The drop
function in Pandas is used to drop specified labels from rows or columns. This article will delve into the various ways you can use the drop
function in Pandas, providing detailed explanations and examples for each case.
Introduction to Pandas Drop
The drop
function is used to remove rows or columns from a DataFrame. It can be used to drop specific rows by index or columns by name. The drop
function has the following basic syntax:
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Parameters:
- labels: Single label or list-like. Index or column labels to drop.
- axis: {0 or ‘index’, 1 or ‘columns’}, default 0. Whether to drop labels from the index (0 or ‘index’) or columns (1 or ‘columns’).
- index: Single label or list-like. Alternative to specifying axis (labels, axis=0 is equivalent to index=labels).
- columns: Single label or list-like. Alternative to specifying axis (labels, axis=1 is equivalent to columns=labels).
- level: For MultiIndex, level from which the labels will be removed.
- inplace: bool, default False. If True, do operation inplace and return None.
- errors: {‘ignore’, ‘raise’}, default ‘raise’. If ‘ignore’, suppress error and only existing labels are dropped.
Dropping Columns by Name
One of the most common uses of the drop
function is to remove columns from a DataFrame by specifying their names.
Example 1: Dropping a Single Column
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# Drop column 'B'
df = df.drop('B', axis=1)
print(df)
Output:
In this example, the column ‘B’ is dropped from the DataFrame. The axis=1
parameter indicates that we are dropping a column.
Explanation:
- import pandas as pd: Import the Pandas library.
- df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’.
- df.drop(‘B’, axis=1): Drop the column ‘B’ from the DataFrame.
- print(df): Print the resulting DataFrame to the console.
Example 2: Dropping Multiple Columns
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9],
'D': [10, 11, 12]
})
# Drop columns 'B' and 'C'
df = df.drop(['B', 'C'], axis=1)
print(df)
Output:
In this example, columns ‘B’ and ‘C’ are dropped from the DataFrame.
Explanation:
- df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, ‘C’, and ‘D’.
- df.drop([‘B’, ‘C’], axis=1): Drop the columns ‘B’ and ‘C’ from the DataFrame.
- print(df): Print the resulting DataFrame to the console.
Dropping Rows by Index
The drop
function can also be used to remove rows from a DataFrame by specifying their index.
Example 3: Dropping a Single Row
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# Drop row with index 1
df = df.drop(1, axis=0)
print(df)
Output:
In this example, the row with index 1 is dropped from the DataFrame. The axis=0
parameter indicates that we are dropping a row.
Explanation:
- df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’.
- df.drop(1, axis=0): Drop the row with index 1 from the DataFrame.
- print(df): Print the resulting DataFrame to the console.
Example 4: Dropping Multiple Rows
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
})
# Drop rows with indices 1 and 3
df = df.drop([1, 3], axis=0)
print(df)
Output:
In this example, rows with indices 1 and 3 are dropped from the DataFrame.
Explanation:
- df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’.
- df.drop([1, 3], axis=0): Drop the rows with indices 1 and 3 from the DataFrame.
- print(df): Print the resulting DataFrame to the console.
Dropping Rows or Columns Using the index
or columns
Parameters
Instead of using the labels
and axis
parameters, you can use the index
or columns
parameters to drop rows or columns.
Example 5: Dropping Columns Using the columns
Parameter
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# Drop column 'B' using the columns parameter
df = df.drop(columns=['B'])
print(df)
Output:
In this example, the column ‘B’ is dropped using the columns
parameter.
Explanation:
- df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’.
- df.drop(columns=[‘B’]): Drop the column ‘B’ using the
columns
parameter. - print(df): Print the resulting DataFrame to the console.
Example 6: Dropping Rows Using the index
Parameter
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# Drop row with index 1 using the index parameter
df = df.drop(index=[1])
print(df)
Output:
In this example, the row with index 1 is dropped using the index
parameter.
Explanation:
- df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’.
- df.drop(index=[1]): Drop the row with index 1 using the
index
parameter. - print(df): Print the resulting DataFrame to the console.
Dropping with the inplace
Parameter
By default, the drop
function returns a new DataFrame with the specified labels dropped. If you want to modify the original DataFrame without creating a new one, you can use the inplace
parameter.
Example 7: Dropping a Column Inplace
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# Drop column 'B' inplace
df.drop('B', axis=1, inplace=True)
print(df)
Output:
In this example, the column ‘B’ is dropped from the DataFrame in place.
Explanation:
- df.drop(‘B’, axis=1, inplace=True): Drop the column ‘B’ from the DataFrame in place, modifying the original DataFrame.
- print(df): Print the resulting DataFrame to the console.
Example 8: Dropping a Row Inplace
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# Drop row with index 1 inplace
df.drop(1, axis=0, inplace=True)
print(df)
Output:
In this example, the row with index 1 is dropped from the DataFrame in place.
Explanation:
- df.drop(1, axis=0, inplace=True): Drop the row with index 1 from the DataFrame in place, modifying the original DataFrame.
- print(df): Print the resulting DataFrame to the console.
Dropping Labels with Errors Handling
The drop
function has an errors
parameter which can be used to handle cases where the labels to be dropped do not exist.
Example 9: Dropping a Non-Existent Column with Error Handling
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# Attempt to drop a non-existent column 'D' with error handling
df = df.drop('D', axis=1, errors='ignore')
print(df)
Output:
In this example, an attempt is made to drop a non-existent column ‘D’. The errors='ignore'
parameter suppresses the error that would normally be raised.
Explanation:
- df.drop(‘D’, axis=1, errors=’ignore’): Attempt to drop the non-existent column ‘D’ and suppress any error that would be raised.
- print(df): Print the resulting DataFrame to the console.
Example 10: Dropping a Non-Existent Row with Error Handling
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# Attempt to drop a non-existent row with index 4 with error handling
df = df.drop(4, axis=0, errors='ignore')
print(df)
Output:
In this example, an attempt is made to drop a non-existent row with index 4. The errors='ignore'
parameter suppresses the error that would normally be raised.
Explanation:
- df.drop(4, axis=0, errors=’ignore’): Attempt to drop the non-existent row with index 4 and suppress any error that would be raised.
- print(df): Print the resulting DataFrame to the console.
Dropping Levels in MultiIndex DataFrames
Pandas also allows for dropping labels from MultiIndex DataFrames by specifying the level from which the labels should be dropped.
Example 11: Dropping a Level from a MultiIndex DataFrame
import pandas as pd
# Create a sample MultiIndex DataFrame
index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1), ('B', 2)])
df = pd.DataFrame({
'C': [3, 4, 5, 6],
'D': [7, 8, 9, 10]
}, index=index)
# Drop level 0 ('A', 'B')
df = df.drop('A', level=0)
print(df)
Output:
In this example, the labels ‘A’ from level 0 are dropped from the MultiIndex DataFrame.
Explanation:
- index = pd.MultiIndex.from_tuples: Create a MultiIndex for the DataFrame.
- df = pd.DataFrame: Create a sample MultiIndex DataFrame with columns ‘C’ and ‘D’.
- df.drop(‘A’, level=0): Drop the labels ‘A’ from level 0.
- print(df): Print the resulting DataFrame to the console.
Dropping Duplicates in DataFrames
The drop
function can also be used in combination with other functions to drop duplicate rows from a DataFrame.
Example 12: Dropping Duplicate Rows
import pandas as pd
# Create a sample DataFrame with duplicate rows
df = pd.DataFrame({
'A': [1, 2, 2, 3],
'B': [4, 5, 5, 6],
'C': [7, 8, 8, 9]
})
# Drop duplicate rows
df = df.drop_duplicates()
print(df)
Output:
In this example, duplicate rows are dropped from the DataFrame.
Explanation:
- df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’ and some duplicate rows.
- df.drop_duplicates(): Drop duplicate rows from the DataFrame.
- print(df): Print the resulting DataFrame to the console.
Example 13: Dropping Duplicate Rows Based on Specific Columns
import pandas as pd
# Create a sample DataFrame with duplicate rows
df = pd.DataFrame({
'A': [1, 2, 2, 3],
'B': [4, 5, 5, 6],
'C': [7, 8, 8, 9]
})
# Drop duplicate rows based on columns 'A' and 'B'
df = df.drop_duplicates(subset=['A', 'B'])
print(df)
Output:
In this example, duplicate rows are dropped based on the values in columns ‘A’ and ‘B’.
Explanation:
- df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’ and some duplicate rows.
- df.drop_duplicates(subset=[‘A’, ‘B’]): Drop duplicate rows based on the values in columns ‘A’ and ‘B’.
- print(df): Print the resulting DataFrame to the console.
Dropping NaN Values
The dropna
function is used to drop rows or columns with missing values (NaN).
Example 14: Dropping Rows with NaN Values
import pandas as pd
import numpy as np
# Create a sample DataFrame with NaN values
df = pd.DataFrame({
'A': [1, 2, np.nan, 4],
'B': [5, np.nan, 7, 8],
'C': [9, 10, 11, np.nan]
})
# Drop rows with NaN values
df = df.dropna()
print(df)
Output:
In this example, rows with NaN values are dropped from the DataFrame.
Explanation:
- import numpy as np: Import the NumPy library for handling NaN values.
- df = pd.DataFrame: Create a sample DataFrame with columns ‘A’, ‘B’, and ‘C’ and some NaN values.
- df.dropna(): Drop rows with NaN values from the DataFrame.
- print(df): Print the resulting DataFrame to the console.
Example 15: Dropping Columns with NaN Values
import pandas as pd
import numpy as np
# Create a sample DataFrame with NaN values
df = pd.DataFrame({
'A': [1, 2, np.nan, 4],
'B': [5, np.nan, 7, 8],
'C': [9, 10, 11, np.nan]
})
# Drop columns with NaN values
df = df.dropna(axis=1)
print(df)
Output:
In this example, columns with NaN values are dropped from the DataFrame.
Explanation:
- df.dropna(axis=1): Drop columns with NaN values from the DataFrame.
- print(df): Print the resulting DataFrame to the console.
Pandas Drop Conclusion
The drop
function in Pandas is a versatile tool for data manipulation, allowing you to remove unwanted rows or columns from your DataFrame. Whether you’re dropping specific labels, handling errors, working with MultiIndex DataFrames, or dealing with duplicates and missing values, the drop
function provides a straightforward and efficient way to clean and manage your data.
By understanding the various parameters and options available with the drop
function, you can perform a wide range of data cleaning tasks, ensuring your DataFrame is ready for analysis. The detailed examples provided in this article illustrate the different ways you can use the drop
function, making it easier for you to apply these techniques to your own datasets. DataFrame to the console.