Logo
READLEARNKNOWCONNECT
Back to Lessons

    Page

  • - What Is Machine Learning?
  • - Supervised vs Unsupervised Learning
  • - Getting Started with Scikit-learn
  • - Example: Simple Linear Regression
  • - Evaluating Machine Learning Models
  • - Best Practices and Tips
  • - Mini Exercise

29. Introduction to Machine Learning with Scikit-learn

Level: AdvancedDuration: 40m

What Is Machine Learning?

Machine Learning (ML) is a branch of AI that allows computers to learn from data and make predictions or decisions without being explicitly programmed for every task. Python, with libraries like Scikit-learn, makes ML accessible and practical.

Supervised vs Unsupervised Learning

In supervised learning, models are trained on labeled data to predict outcomes. In unsupervised learning, models find patterns or groupings in unlabeled data.

TypeDescriptionExamples
Supervised LearningPredict outcomes from labeled dataLinear Regression, Decision Trees, Classification
Unsupervised LearningFind patterns in unlabeled dataClustering, Dimensionality Reduction

Getting Started with Scikit-learn

Scikit-learn is a popular Python library for machine learning. It provides simple and consistent tools for data preprocessing, model building, and evaluation.

bash
pip install scikit-learn pandas numpy

Example: Simple Linear Regression

Let's predict a target variable based on one feature using linear regression.

python
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# Sample dataset
data = pd.DataFrame({
    'Hours_Studied': [1, 2, 3, 4, 5],
    'Score': [10, 20, 30, 40, 50]
})

X = data[['Hours_Studied']]
y = data['Score']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

print("Predictions:", predictions)
print("MSE:", mean_squared_error(y_test, predictions))

Evaluating Machine Learning Models

Common metrics for regression include Mean Squared Error (MSE) and R² score. For classification, we use accuracy, precision, recall, and F1-score.

Best Practices and Tips

  • Split data into training and testing sets to evaluate model performance.
  • Normalize or scale features for algorithms sensitive to feature magnitude.
  • Avoid overfitting by using cross-validation or regularization.
  • Document and track experiments with clear versioning.
  • Start simple; try linear models before complex ones.

Mini Exercise

Use a dataset of your choice and build a regression model to predict a numeric outcome. Split the data, train the model, make predictions, and calculate the mean squared error.

Scikit-learn Official Documentation