Pandas DataFrame

Pandas DataFrame

Pandas is an open-source Python library that provides high-performance, easy-to-use data structures and data analysis tools. Among its core offerings, the DataFrame is one of the most important data structures in Pandas. It is essentially a two-dimensional labeled data structure with columns of potentially different types of data. In this article, we will explore the various functionalities of the Pandas DataFrame with comprehensive examples.

1. Creating a Pandas DataFrame

One of the most basic operations when working with Pandas is creating a DataFrame. You can create a DataFrame from various data sources like lists, dictionaries, and even from external data files.

Example 1: Creating a Pandas DataFrame from a List

import pandas as pd

data = [['Alex', 10], ['Bob', 12], ['Clarke', 13]]
df = pd.DataFrame(data, columns=['Name', 'Age'])
print(df)

Output:

Pandas DataFrame

Example 2: Creating a Pandas DataFrame from a Dictionary

import pandas as pd

data = {'Name': ['Tom', 'Jack', 'Steve', 'Ricky'],
        'Age': [28, 34, 29, 42]}
df = pd.DataFrame(data)
print(df)

Output:

Pandas DataFrame

2. Reading and Writing Data in Pandas DataFrame

Pandas supports reading from and writing to different data formats including CSV, Excel, SQL databases, and more.

Example 3: Reading Data from CSV

import pandas as pd

df = pd.read_csv('pandasdataframe.com_data.csv')
print(df)

Example 4: Writing Data to Excel

import pandas as pd

df = pd.DataFrame({'Data': [10, 20, 30, 20, 15, 30, 45]})
df.to_excel('pandasdataframe.com_output.xlsx', index=False)

3. Data Selection, Addition, and Deletion in Pandas DataFrame

Manipulating data is central to using Pandas effectively. You can select, add, or delete data in a DataFrame.

Example 5: Selecting Data by Column in Pandas DataFrame

import pandas as pd

data = {'Name': ['Tom', 'Jerry', 'Mickey', 'Donald'],
        'Age': [25, 30, 35, 40]}
df = pd.DataFrame(data)
print(df['Name'])

Output:

Pandas DataFrame

Example 6: Adding a New Column in Pandas DataFrame

import pandas as pd

df = pd.DataFrame({'Name': ['Tom', 'Jack', 'Steve', 'Ricky'],
                   'Age': [28, 34, 29, 42]})
df['Salary'] = [1000, 1500, 1200, 1800]
print(df)

Output:

Pandas DataFrame

Example 7: Deleting a Column in Pandas DataFrame

import pandas as pd

data = {'Name': ['Tom', 'Jerry', 'Mickey', 'Donald'],
        'Age': [25, 30, 35, 40]}
df = pd.DataFrame(data)
df.drop('Age', axis=1, inplace=True)
print(df)

Output:

Pandas DataFrame

4. Handling Missing Data in Pandas DataFrame

Handling missing data is another critical part of data analysis. Pandas provides several methods to deal with missing data in DataFrames.

Example 8: Filling Missing Data in Pandas DataFrame

import pandas as pd
import numpy as np

df = pd.DataFrame({'First Score': [100, 90, np.nan, 95],
                   'Second Score': [30, 45, 56, np.nan],
                   'Third Score': [np.nan, 40, 80, 98]})
df.fillna(0, inplace=True)
print(df)

Output:

Pandas DataFrame

Example 9: Dropping Rows with Missing Data in Pandas DataFrame

import pandas as pd
import numpy as np

df = pd.DataFrame({'First Score': [100, 90, np.nan, 95],
                   'Second Score': [30, 45, 56, np.nan],
                   'Third Score': [np.nan, 40, 80, 98]})
df.dropna(inplace=True)
print(df)

Output:

Pandas DataFrame

5. Pandas DataFrame Operations

Pandas DataFrames support a variety of operations that can be very useful in data analysis and manipulation.

Example 10: Performing Arithmetic Operations in Pandas DataFrame

import pandas as pd

df = pd.DataFrame({
    'A': [14, 4, 5, 4, 1],
    'B': [5, 2, 54, 3, 2],
    'C': [20, 20, 7, 3, 8],
    'D': [14, 3, 6, 2, 6]})
df['A'] = df['A'] + 10
print(df)

Output:

Pandas DataFrame

Example 11: Applying Functions to Data in Pandas DataFrame

import pandas as pd

df = pd.DataFrame({
    'A': [14, 4, 5, 4, 1],
    'B': [5, 2, 54, 3, 2],
    'C': [20, 20, 7, 3, 8],
    'D': [14, 3, 6, 2, 6]})
df = df.apply(lambda x: x + 10)
print(df)

Output:

Pandas DataFrame

6. Grouping and Aggregating Data in Pandas DataFrame

Grouping and aggregating data is essential when you are dealing with large datasets or when you need to perform operations on categorized data.

Example 12: Grouping Data by Column in Pandas DataFrame

import pandas as pd

data = {'Company': ['Google', 'Google', 'Microsoft', 'Microsoft', 'Facebook', 'Facebook'],
        'Person': ['Sam', 'Charlie', 'Amy', 'Vanessa', 'Carl', 'Sarah'],
        'Sales': [200, 120, 340, 124, 243, 350]}
df = pd.DataFrame(data)
grouped_df = df.groupby('Company')
print(grouped_df.mean())

Example 13: Aggregating Data Using Multiple Functions in Pandas DataFrame

import pandas as pd

data = {'Company': ['Google', 'Google', 'Microsoft', 'Microsoft', 'Facebook', 'Facebook'],
        'Person': ['Sam', 'Charlie', 'Amy', 'Vanessa', 'Carl', 'Sarah'],
        'Sales': [200, 120, 340, 124, 243, 350]}
df = pd.DataFrame(data)
grouped_df = df.groupby('Company').agg({'Sales': ['mean', 'sum']})
print(grouped_df)

Output:

Pandas DataFrame

7. Merging, Joining, and Concatenating in Pandas DataFrame

Combining multiple datasets is a common task in data analysis. Pandas provides multiple ways to merge, join, or concatenate DataFrames.

Example 14: Concatenating Pandas DataFrames

import pandas as pd

df1 = pd.DataFrame({
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']},
    index=[0, 1, 2, 3])

df2 = pd.DataFrame({
    'A': ['A4', 'A5', 'A6', 'A7'],
    'B': ['B4', 'B5', 'B6', 'B7'],
    'C': ['C4', 'C5', 'C6', 'C7'],
    'D': ['D4', 'D5', 'D6', 'D7']},
    index=[4, 5, 6, 7])

result = pd.concat([df1, df2])
print(result)

Output:

Pandas DataFrame

Example 15: Merging Pandas DataFrames on Key

import pandas as pd

left = pd.DataFrame({
    'key': ['K0', 'K1', 'K2', 'K3'],
    'A': ['A0', 'A1', 'A2', 'A3'],
    'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({
    'key': ['K0', 'K1', 'K2', 'K3'],
    'C': ['C0', 'C1', 'C2', 'C3'],
    'D': ['D0', 'D1', 'D2', 'D3']})

result = pd.merge(left, right, on='key')
print(result)

Output:

Pandas DataFrame

Example 16: Joining Pandas DataFrames

import pandas as pd

left = pd.DataFrame({
    'A': ['A0', 'A1', 'A2'],
    'B': ['B0', 'B1', 'B2']},
    index=['K0', 'K1', 'K2'])

right = pd.DataFrame({
    'C': ['C0', 'C1', 'C2'],
    'D': ['D0', 'D1', 'D2']},
    index=['K0', 'K2', 'K3'])

result = left.join(right)
print(result)

Output:

Pandas DataFrame

8. Advanced Pandas DataFrame Operations

As you become more familiar with Pandas, you’ll encounter situations that require advanced operations such as pivoting, melting, and applying custom transformations.

Example 17: Pivoting Pandas DataFrames

Pivoting a table is a common operation in data analysis that involves transforming data from long format to wide format.

import pandas as pd

data = {'Date': ['2023-01-01', '2023-01-01', '2023-01-02', '2023-01-02'],
        'Type': ['A', 'B', 'A', 'B'],
        'Value': [100, 200, 300, 400]}
df = pd.DataFrame(data)
pivot_df = df.pivot(index='Date', columns='Type', values='Value')
print(pivot_df)

Output:

Pandas DataFrame

Example 18: Melting Pandas DataFrames

Melting is the opposite of pivoting and changes the DataFrame from wide format to long format.

import pandas as pd

df = pd.DataFrame({
    'Date': ['2023-01-01', '2023-01-02'],
    'A': [100, 300],
    'B': [200, 400]})
melted_df = pd.melt(df, id_vars=['Date'], value_vars=['A', 'B'])
print(melted_df)

Output:

Pandas DataFrame

Example 19: Applying Custom Transformations Using applymap

The applymap function allows you to apply a function to every single element in the DataFrame.

import pandas as pd

df = pd.DataFrame({
    'A': [1, 2, 3],
    'B': [4, 5, 6],
    'C': [7, 8, 9]})
df = df.applymap(lambda x: x**2)
print(df)

Example 20: Filtering Data in Pandas DataFrame

Filtering allows you to select a subset of rows that meet certain criteria.

import pandas as pd

df = pd.DataFrame({
    'Name': ['Tom', 'Nick', 'Krish', 'Jack'],
    'Age': [20, 21, 19, 18]})
filtered_df = df[df['Age'] > 19]
print(filtered_df)

Output:

Pandas DataFrame

9. Time Series Data

Pandas is particularly strong in working with time series data. You can perform various operations specific to time series data like resampling, time shifts, and window functions.

Example 21: Resampling Time Series Data

Resampling involves changing the frequency of your time series observations.

import pandas as pd
import numpy as np

idx = pd.date_range('20230101', periods=60, freq='D')
ts = pd.Series(range(len(idx)), index=idx)
resampled_ts = ts.resample('M').mean()
print(resampled_ts)

Example 22: Time Shifting

Time shifting lets you move data forward or backward in time.

import pandas as pd

df = pd.DataFrame({
    'sales': [3, 5, 2, 6],
    'signups': [5, 5, 6, 12],
    'visits': [20, 42, 28, 62]},
    index=pd.date_range(start='2023-01-01', periods=4))
df_shifted = df.shift(1)
print(df_shifted)

Output:

Pandas DataFrame

Example 23: Window Functions

Window functions are useful for calculating rolling statistics.

import pandas as pd

s = pd.Series([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
rolling_mean = s.rolling(window=3).mean()
print(rolling_mean)

Output:

Pandas DataFrame

Pandas DataFrame Conclusion

Pandas DataFrame is a powerful tool for data manipulation and analysis. By understanding how to effectively use its various functionalities, you can perform a wide range of data analysis tasks more efficiently. This article has provided a comprehensive overview of the key features of Pandas DataFrame along with practical examples to help you get started with data analysis in Python. Whether you are dealing with small datasets or large, structured or unstructured data, Pandas offers the functionality you need to clean, transform, and analyze your data.