Pandas Drop Column
Pandas is a powerful and flexible data manipulation library for Python, widely used for data analysis and data wrangling. One of the common tasks in data manipulation is removing unnecessary columns from a DataFrame. This article will delve into the various ways to drop columns in Pandas, providing comprehensive explanations and numerous examples to illustrate each method.
1. Introduction to Pandas
Pandas is an open-source library that provides high-performance, easy-to-use data structures and data analysis tools for Python. It is built on top of NumPy and integrates well with many other libraries in the Python ecosystem.
To get started with Pandas, you need to install it using pip:
pip install pandas
Once installed, you can import it in your Python script:
import pandas as pd
2. Understanding DataFrames
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous tabular data structure with labeled axes (rows and columns). Think of it as a spreadsheet or SQL table, or a dict of Series objects.
Here is an example of creating a simple DataFrame:
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
print(df)
Output:
This DataFrame has three columns: A, B, and C.
3. Dropping Columns by Name
One of the most straightforward methods to drop a column is by using the column’s name. The drop
method in Pandas provides a convenient way to do this.
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
df = df.drop(columns=['B'])
print(df)
Output:
In this example, the column ‘B’ is dropped from the DataFrame.
4. Dropping Columns by Index
Sometimes, you may want to drop a column based on its index position rather than its name. You can use the iloc
accessor to achieve this.
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
df = df.drop(df.columns[1], axis=1)
print(df)
Output:
In this case, the column at index 1 (which is ‘B’) is dropped.
5. Dropping Multiple Columns
You can drop multiple columns simultaneously by providing a list of column names to the drop
method.
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12],
'D': [13, 14, 15, 16]
}
df = pd.DataFrame(data)
df = df.drop(columns=['B', 'D'])
print(df)
Output:
Here, columns ‘B’ and ‘D’ are dropped from the DataFrame.
6. Dropping Columns with axis
Parameter
The axis
parameter specifies whether to drop labels from the rows (0) or columns (1). When dropping columns, axis=1
is used.
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
df = df.drop('C', axis=1)
print(df)
Output:
In this example, the column ‘C’ is dropped by specifying axis=1
.
7. Dropping Columns in Place
By default, the drop
method returns a new DataFrame with the specified columns removed. To modify the original DataFrame directly, you can use the inplace=True
parameter.
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
df.drop('A', axis=1, inplace=True)
print(df)
Output:
In this case, the column ‘A’ is dropped in place, modifying the original DataFrame.
8. Using drop
with filter
The filter
method allows you to select columns that match a specific pattern. You can then use the drop
method to remove these columns.
import pandas as pd
data = {
'pandasdataframe.com_A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'pandasdataframe.com_C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
columns_to_drop = df.filter(like='pandasdataframe.com').columns
df = df.drop(columns=columns_to_drop)
print(df)
Output:
Here, columns that contain ‘pandasdataframe.com’ in their names are dropped.
9. Conditional Column Dropping
You may want to drop columns based on certain conditions. This can be achieved by using the apply
function along with a custom condition.
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'pandasdataframe.com_B': [5, 6, 7, 8],
'C': [9, 10, 11, 12],
'pandasdataframe.com_D': [13, 14, 15, 16]
}
df = pd.DataFrame(data)
columns_to_drop = [col for col in df.columns if 'pandasdataframe.com' in col]
df = df.drop(columns=columns_to_drop)
print(df)
Output:
In this example, columns containing ‘pandasdataframe.com’ in their names are conditionally dropped.
10. Dropping Columns by Data Type
Another way to drop columns is based on their data type. You can use the select_dtypes
method to include or exclude specific data types.
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5.5, 6.6, 7.7, 8.8],
'C': ['foo', 'bar', 'baz', 'qux'],
'D': [True, False, True, False]
}
df = pd.DataFrame(data)
df = df.drop(columns=df.select_dtypes(include=['float64']).columns)
print(df)
Output:
Here, all columns of type float64
are dropped.
11. Dropping Columns with NaN Values
In some datasets, you might want to drop columns that contain any missing values (NaNs). The dropna
method can be used for this purpose.
import pandas as pd
import numpy as np
data = {
'A': [1, 2, np.nan, 4],
'B': [5, 6, 7, 8],
'C': [np.nan, np.nan, np.nan, np.nan]
}
df = pd.DataFrame(data)
df = df.dropna(axis=1, how='any')
print(df)
Output:
In this example, columns ‘A’ and ‘C’ are dropped because they contain NaN values.
12. Practical Examples
Let’s explore some practical examples of dropping columns in Pandas. Each example includes detailed explanations and a complete, context-independent Pandas code snippet.
Example 1: Drop a Single Column by Name
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
df = df.drop(columns=['B'])
print(df)
Output:
In this example, the column ‘B’ is dropped from the DataFrame.
Example 2: Drop a Column by Index
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
df = df.drop(df.columns[1], axis=1)
print(df)
Output:
Here, the column at index 1 (which is ‘B’) is dropped.
Example 3: Drop Multiple Columns by Name
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12],
'D': [13, 14, 15, 16]
}
df = pd.DataFrame(data)
df = df.drop(columns=['B', 'D'])
print(df)
Output:
In this example, columns ‘B’ and ‘D’ are dropped from the DataFrame.
Example 4: Drop Columns with axis
Parameter
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
df = df.drop('C', axis=1)
print(df)
Output:
The column ‘C’ is dropped by specifying axis=1
.
Example 5: Drop Column in Place
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
df.drop('A', axis=1, inplace=True)
print(df)
Output:
Here, the column ‘A’ is dropped in place, modifying the original DataFrame.
Example 6: Drop Columns Matching a Pattern
import pandas as pd
data = {
'pandasdataframe.com_A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'pandasdataframe.com_C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
columns_to_drop = df.filter(like='pandasdataframe.com').columns
df = df.drop(columns=columns_to_drop)
print(df)
Output:
Columns containing ‘pandasdataframe.com’ in their names are dropped.
Example 7: Drop Columns Based on Condition
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'pandasdataframe.com_B': [5, 6, 7, 8],
'C': [9, 10, 11, 12],
'pandasdataframe.com_D': [13, 14, 15, 16]
}
df = pd.DataFrame(data)
columns_to_drop = [col for col in df.columns if 'pandasdataframe.com' in col]
df = df.drop(columns=columns_to_drop)
print(df)
Output:
Columns containing ‘pandasdataframe.com’ in their names are conditionally dropped.
Example 8: Drop Columns by Data Type
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5.5, 6.6, 7.7, 8.8],
'C': ['foo', 'bar', 'baz', 'qux'],
'D': [True, False, True, False]
}
df = pd.DataFrame(data)
df = df.drop(columns=df.select_dtypes(include=['float64']).columns)
print(df)
Output:
All columns of type float64
are dropped.
Example 9: Drop Columns with NaN Values
import pandas as pd
import numpy as np
data = {
'A': [1, 2, np.nan, 4],
'B': [5, 6, 7, 8],
'C': [np.nan, np.nan, np.nan, np.nan]
}
df = pd.DataFrame(data)
df = df.dropna(axis=1, how='any')
print(df)
Output:
Columns ‘A’ and ‘C’ are dropped because they contain NaN values.
Example 10: Drop Columns with All NaN Values
import pandas as pd
import numpy as np
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [np.nan, np.nan, np.nan, np.nan]
}
df = pd.DataFrame(data)
df = df.dropna(axis=1, how='all')
print(df)
Output:
Column ‘C’ is dropped because it contains all NaN values.
Example 11: Drop Columns with Any NaN Values
import pandas as pd
import numpy as np
data = {
'A': [1, 2, np.nan, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
df = df.dropna(axis=1, how='any')
print(df)
Output:
Column ‘A’ is dropped because it contains NaN values.
Example 12: Drop Columns Using Regex
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'pandasdataframe.com_B': [5, 6, 7, 8],
'C': [9, 10, 11, 12],
'pandasdataframe.com_D': [13, 14, 15, 16]
}
df = pd.DataFrame(data)
columns_to_drop = df.filter(regex='pandasdataframe.com').columns
df = df.drop(columns=columns_to_drop)
print(df)
Output:
Columns matching the regex pattern ‘pandasdataframe.com’ are dropped.
Example 13: Drop Columns with Specific Suffix
import pandas as pd
data = {
'A_suffix': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C_suffix': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
columns_to_drop = df.filter(regex='_suffix$').columns
df = df.drop(columns=columns_to_drop)
print(df)
Output:
Columns with the suffix ‘_suffix’ are dropped.
Example 14: Drop Columns with Specific Prefix
import pandas as pd
data = {
'prefix_A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'prefix_C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
columns_to_drop = df.filter(regex='^prefix').columns
df = df.drop(columns=columns_to_drop)
print(df)
Output:
Columns with the prefix ‘prefix_’ are dropped.
Example 15: Drop Columns with Length Condition
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'BB': [5, 6, 7, 8],
'CCC': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
columns_to_drop = [col for col in df.columns if len(col) > 1]
df = df.drop(columns=columns_to_drop)
print(df)
Output:
Columns with names longer than 1 character are dropped.
Example 16: Drop Columns with Value Condition
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
columns_to_drop = [col for col in df.columns if df[col].sum() > 20]
df = df.drop(columns=columns_to_drop)
print(df)
Output:
Columns where the sum of values is greater than 20 are dropped.
Example 17: Drop Columns Based on Row Values
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
columns_to_drop = [col for col in df.columns if (df[col] > 10).any()]
df = df.drop(columns=columns_to_drop)
print(df)
Output:
Columns containing any value greater than 10 are dropped.
Example 18: Drop Columns with Specific Data Type
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5.5, 6.6, 7.7, 8.8],
'C': ['foo', 'bar', 'baz', 'qux']
}
df = pd.DataFrame(data)
columns_to_drop = df.select_dtypes(include=['float64']).columns
df = df.drop(columns=columns_to_drop)
print(df)
Output:
Columns of type float64
are dropped.
Example 19: Drop Columns with Specific Substring
import pandas as pd
data = {
'pandasdataframe.com_A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'pandasdataframe.com_C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
columns_to_drop = [col for col in df.columns if 'pandasdataframe.com' in col]
df = df.drop(columns=columns_to_drop)
print(df)
Output:
Columns containing the substring ‘pandasdataframe.com’ are dropped.
Example 20: Drop Columns Using a Custom Function
import pandas as pd
data = {
'A': [1, 2, 3, 4],
'B': [5, 6, 7, 8],
'C': [9, 10, 11, 12]
}
df = pd.DataFrame(data)
def custom_condition(col):
return df[col].mean() > 5
columns_to_drop = [col for col in df.columns if custom_condition(col)]
df = df.drop(columns=columns_to_drop)
print(df)
Output:
Columns where the mean of values is greater than 5 are dropped using a custom function.
By using the methods and examples provided in this article, you can efficiently and effectively manage columns in your Pandas DataFrames. Whether you need to drop columns by name, index, condition, or data type, Pandas offers flexible options to meet your data manipulation needs.