29. Introduction to Machine Learning with Scikit-learn
What Is Machine Learning?
Machine Learning (ML) is a branch of AI that allows computers to learn from data and make predictions or decisions without being explicitly programmed for every task. Python, with libraries like Scikit-learn, makes ML accessible and practical.
Supervised vs Unsupervised Learning
In supervised learning, models are trained on labeled data to predict outcomes. In unsupervised learning, models find patterns or groupings in unlabeled data.
| Type | Description | Examples |
|---|---|---|
| Supervised Learning | Predict outcomes from labeled data | Linear Regression, Decision Trees, Classification |
| Unsupervised Learning | Find patterns in unlabeled data | Clustering, Dimensionality Reduction |
Getting Started with Scikit-learn
Scikit-learn is a popular Python library for machine learning. It provides simple and consistent tools for data preprocessing, model building, and evaluation.
pip install scikit-learn pandas numpyExample: Simple Linear Regression
Let's predict a target variable based on one feature using linear regression.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Sample dataset
data = pd.DataFrame({
'Hours_Studied': [1, 2, 3, 4, 5],
'Score': [10, 20, 30, 40, 50]
})
X = data[['Hours_Studied']]
y = data['Score']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print("Predictions:", predictions)
print("MSE:", mean_squared_error(y_test, predictions))Evaluating Machine Learning Models
Common metrics for regression include Mean Squared Error (MSE) and R² score. For classification, we use accuracy, precision, recall, and F1-score.
Best Practices and Tips
- Split data into training and testing sets to evaluate model performance.
- Normalize or scale features for algorithms sensitive to feature magnitude.
- Avoid overfitting by using cross-validation or regularization.
- Document and track experiments with clear versioning.
- Start simple; try linear models before complex ones.
Mini Exercise
Use a dataset of your choice and build a regression model to predict a numeric outcome. Split the data, train the model, make predictions, and calculate the mean squared error.