29. Introduction to Machine Learning with Scikit-learn

Level: AdvancedDuration: 40m

What Is Machine Learning?

Machine Learning (ML) is a branch of AI that allows computers to learn from data and make predictions or decisions without being explicitly programmed for every task. Python, with libraries like Scikit-learn, makes ML accessible and practical.

Supervised vs Unsupervised Learning

In supervised learning, models are trained on labeled data to predict outcomes. In unsupervised learning, models find patterns or groupings in unlabeled data.

Type	Description	Examples
Supervised Learning	Predict outcomes from labeled data	Linear Regression, Decision Trees, Classification
Unsupervised Learning	Find patterns in unlabeled data	Clustering, Dimensionality Reduction

Getting Started with Scikit-learn

Scikit-learn is a popular Python library for machine learning. It provides simple and consistent tools for data preprocessing, model building, and evaluation.

bash

pip install scikit-learn pandas numpy

Example: Simple Linear Regression

Let's predict a target variable based on one feature using linear regression.

python

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Sample dataset
data = pd.DataFrame({
    'Hours_Studied': [1, 2, 3, 4, 5],
    'Score': [10, 20, 30, 40, 50]
})

X = data[['Hours_Studied']]
y = data['Score']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

print("Predictions:", predictions)
print("MSE:", mean_squared_error(y_test, predictions))

Evaluating Machine Learning Models

Common metrics for regression include Mean Squared Error (MSE) and R² score. For classification, we use accuracy, precision, recall, and F1-score.

Best Practices and Tips

Split data into training and testing sets to evaluate model performance.
Normalize or scale features for algorithms sensitive to feature magnitude.
Avoid overfitting by using cross-validation or regularization.
Document and track experiments with clear versioning.
Start simple; try linear models before complex ones.

Mini Exercise

Use a dataset of your choice and build a regression model to predict a numeric outcome. Split the data, train the model, make predictions, and calculate the mean squared error.

Scikit-learn Official Documentation