20. Introduction to Data Analysis with Pandas
Level: AdvancedDuration: 42m
What Is Pandas?
Pandas is a powerful Python library used for data analysis and manipulation. It provides easy-to-use data structures like Series (1D) and DataFrames (2D) to work with structured data, such as CSV files, Excel sheets, or SQL tables.
Installing Pandas
bash
pip install pandasMake sure your environment has Pandas installed. You can also install `numpy` and `matplotlib` alongside for numeric operations and visualizations.
Series and DataFrames
A Series is like a column of data, while a DataFrame is like a table with rows and columns. DataFrames are the most commonly used structure in Pandas.
python
import pandas as pd
# Creating a Series
ages = pd.Series([25, 30, 22, 40])
print(ages)
# Creating a DataFrame
data = {'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 22]}
df = pd.DataFrame(data)
print(df)Reading and Writing Data
python
# Read CSV
csv_df = pd.read_csv('data.csv')
# Write CSV
df.to_csv('output.csv', index=False)Inspecting Data
python
print(df.head()) # first 5 rows
print(df.info()) # summary info
print(df.describe()) # stats for numeric columnsFiltering and Selecting Data
python
# Select a column
print(df['Name'])
# Filter rows
adults = df[df['Age'] >= 25]
print(adults)Basic Aggregations
python
print(df['Age'].mean())
print(df['Age'].sum())
print(df['Age'].max())Adding and Removing Columns
python
df['Age_in_5_years'] = df['Age'] + 5
print(df)
df.drop('Age_in_5_years', axis=1, inplace=True)
print(df)Best Practices
- Always inspect data before analysis using `head()`, `info()`, and `describe()`.
- Use vectorized operations instead of loops for performance.
- Keep column names consistent and readable.
- Chain operations carefully to avoid unintended side-effects.
Common Misconceptions
- Pandas is not just for CSVs — it can handle Excel, SQL, JSON, and more.
- Operations on DataFrames do not always modify in place unless specified.
- Indexing can be tricky; understand the difference between `loc` and `iloc`.
💡 Think of Pandas as Excel in Python: you can read data, analyze it, filter, and perform calculations — all programmatically.
Mini Project Step
Load a CSV file with sales data. Calculate total sales per product, filter products above a certain threshold, and create a new column showing the discounted price (10% off).