Logo
READLEARNKNOWCONNECT
Back to Lessons

    Page

  • - What Is Pandas?
  • - Installing Pandas
  • - Series and DataFrames
  • - Reading and Writing Data
  • - Inspecting Data
  • - Filtering and Selecting Data
  • - Basic Aggregations
  • - Adding and Removing Columns
  • - Best Practices
  • - Common Misconceptions
  • - Mini Project Step

20. Introduction to Data Analysis with Pandas

Level: AdvancedDuration: 42m

What Is Pandas?

Pandas is a powerful Python library used for data analysis and manipulation. It provides easy-to-use data structures like Series (1D) and DataFrames (2D) to work with structured data, such as CSV files, Excel sheets, or SQL tables.

Installing Pandas

bash
pip install pandas

Make sure your environment has Pandas installed. You can also install `numpy` and `matplotlib` alongside for numeric operations and visualizations.

Series and DataFrames

A Series is like a column of data, while a DataFrame is like a table with rows and columns. DataFrames are the most commonly used structure in Pandas.

python
import pandas as pd

# Creating a Series
ages = pd.Series([25, 30, 22, 40])
print(ages)

# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 22]}
df = pd.DataFrame(data)
print(df)

Reading and Writing Data

python
# Read CSV
csv_df = pd.read_csv('data.csv')
# Write CSV
df.to_csv('output.csv', index=False)

Inspecting Data

python
print(df.head())       # first 5 rows
print(df.info())       # summary info
print(df.describe())   # stats for numeric columns

Filtering and Selecting Data

python
# Select a column
print(df['Name'])

# Filter rows
adults = df[df['Age'] >= 25]
print(adults)

Basic Aggregations

python
print(df['Age'].mean())
print(df['Age'].sum())
print(df['Age'].max())

Adding and Removing Columns

python
df['Age_in_5_years'] = df['Age'] + 5
print(df)
df.drop('Age_in_5_years', axis=1, inplace=True)
print(df)

Best Practices

  • Always inspect data before analysis using `head()`, `info()`, and `describe()`.
  • Use vectorized operations instead of loops for performance.
  • Keep column names consistent and readable.
  • Chain operations carefully to avoid unintended side-effects.

Common Misconceptions

  • Pandas is not just for CSVs — it can handle Excel, SQL, JSON, and more.
  • Operations on DataFrames do not always modify in place unless specified.
  • Indexing can be tricky; understand the difference between `loc` and `iloc`.

Pandas Official Documentation

💡 Think of Pandas as Excel in Python: you can read data, analyze it, filter, and perform calculations — all programmatically.

Mini Project Step

Load a CSV file with sales data. Calculate total sales per product, filter products above a certain threshold, and create a new column showing the discounted price (10% off).