Pandas astype float to int
Pandas is a powerful data manipulation library in Python that allows for extensive operations on data sets, including data type conversions. One common task in data preprocessing is converting data types from float to integer. This conversion is often necessary when dealing with numerical data that originally contains decimals but needs to be transformed into whole numbers for analysis or reporting purposes.
In this article, we will explore various methods to convert float data types to integer data types in Pandas, providing detailed examples with complete, standalone code snippets. Each example will demonstrate a different scenario or method of conversion, ensuring you have a comprehensive understanding of how to handle this task in your data processing workflows.
Example 1: Basic Conversion Using astype(int)
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1.0, 2.2, 3.5, 4.8],
'B': [5.9, 6.1, 7.2, 8.3]
})
# Convert float to int
df['A'] = df['A'].astype(int)
print(df)
Output:
Example 2: Handling NaN Values Before Conversion
import pandas as pd
# Create a DataFrame with NaN values
df = pd.DataFrame({
'A': [1.0, 2.2, None, 4.8],
'B': [5.9, 6.1, 7.2, None]
})
# Fill NaN values with 0 and convert to int
df['A'] = df['A'].fillna(0).astype(int)
print(df)
Output:
Example 3: Using pd.to_numeric()
for Conversion
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': ['1.1', '2.2', '3.3', '4.4'],
'B': ['5.5', '6.6', '7.7', '8.8']
})
# Convert string to float, then to int
df['A'] = pd.to_numeric(df['A'], errors='coerce').astype(int)
print(df)
Output:
Example 4: Rounding Before Conversion
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1.6, 2.5, 3.1, 4.9],
'B': [5.3, 6.7, 7.8, 8.2]
})
# Round and convert to int
df['A'] = df['A'].round().astype(int)
print(df)
Output:
Example 5: Using numpy.floor()
for Conversion
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame({
'A': [1.9, 2.8, 3.7, 4.6],
'B': [5.5, 6.4, 7.3, 8.2]
})
# Apply floor and convert to int
df['A'] = np.floor(df['A']).astype(int)
print(df)
Output:
Example 6: Using numpy.ceil()
for Conversion
import pandas as pd
import numpy as np
# Create a DataFrame
df = pd.DataFrame({
'A': [1.1, 2.2, 3.3, 4.4],
'B': [5.5, 6.6, 7.7, 8.8]
})
# Apply ceil and convert to int
df['A'] = np.ceil(df['A']).astype(int)
print(df)
Output:
Example 7: Converting Multiple Columns Simultaneously
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1.0, 2.2, 3.5, 4.8],
'B': [5.9, 6.1, 7.2, 8.3],
'C': [9.4, 10.5, 11.6, 12.7]
})
# Convert multiple columns
df = df.astype({'A': int, 'B': int, 'C': int})
print(df)
Output:
Example 8: Handling Large DataFrames
import pandas as pd
# Generate a large DataFrame
data = {'A': [x + 0.5 for x in range(1000000)],
'B': [x + 1.5 for x in range(1000000)]}
df = pd.DataFrame(data)
# Convert to int
df = df.astype(int)
print(df)
Output:
Example 9: Conversion with Conditional Logic
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'A': [1.5, 2.5, 3.5, 4.5],
'B': [5.5, 6.5, 7.5, 8.5]
})
# Convert using a condition
df['A'] = df['A'].apply(lambda x: int(x) if x > 2 else x)
print(df)
Output:
These examples cover a range of methods and scenarios for converting float data types to integer data types in Pandas. By understanding these techniques, you can handle data type conversions effectively in your data processing tasks.