Support Vector Machines
Support Vector Machines (SVMs) are powerful classifiers that find the optimal hyperplane to separate classes. Unlike other classifiers that just find any separating boundary, SVMs find the boundary with the maximum margin - the largest possible distance between the boundary and the nearest data points from each class. This makes SVMs robust and great at generalizing to new data.
What is a Hyperplane?
A hyperplane is a decision boundary that separates different classes. In 2D, it's a line. In 3D, it's a plane. In higher dimensions, we call it a hyperplane. SVM finds the hyperplane that maximizes the margin between classes.
Support Vectors
The data points closest to the decision boundary. These "support" the hyperplane - if you remove other points, the boundary stays the same. Only support vectors matter for the final model.
Margin
The distance between the hyperplane and the nearest support vectors. SVM maximizes this margin, creating a "safety buffer" that improves generalization to new data.
Hard Margin vs Soft Margin
In a perfect world, data is linearly separable and we can find a hyperplane with no misclassifications. This is called hard margin SVM. But real data is messy - some points may be on the wrong side. Soft margin SVM allows some misclassifications controlled by the C parameter.
# Basic SVM Classification
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Load and prepare data
iris = load_iris()
X, y = iris.data[:, :2], iris.target # Use first 2 features for visualization
# SVM requires feature scaling!
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(
X_scaled, y, test_size=0.2, random_state=42
)
# Train SVM with linear kernel
svm_clf = SVC(kernel='linear', C=1.0, random_state=42)
svm_clf.fit(X_train, y_train)
print(f"Training Accuracy: {svm_clf.score(X_train, y_train):.2%}")
print(f"Test Accuracy: {svm_clf.score(X_test, y_test):.2%}")
print(f"Number of Support Vectors: {svm_clf.n_support_}")
The C Parameter
The C parameter controls the trade-off between having a smooth decision boundary and classifying training points correctly. Think of C as "how much you care about mistakes":
Low C (e.g., 0.01)
- Wider margin (more tolerance for errors)
- More regularization
- Simpler decision boundary
- May underfit if too low
- Better for noisy data
High C (e.g., 100)
- Narrower margin (less tolerance)
- Less regularization
- Complex decision boundary
- May overfit if too high
- Better for clean data
# Effect of C parameter
from sklearn.svm import SVC
# Low C - wider margin, more misclassifications allowed
svm_low_c = SVC(kernel='linear', C=0.01, random_state=42)
svm_low_c.fit(X_train, y_train)
print(f"Low C (0.01): Train={svm_low_c.score(X_train, y_train):.2%}, Test={svm_low_c.score(X_test, y_test):.2%}")
# High C - narrow margin, fewer misclassifications
svm_high_c = SVC(kernel='linear', C=100, random_state=42)
svm_high_c.fit(X_train, y_train)
print(f"High C (100): Train={svm_high_c.score(X_train, y_train):.2%}, Test={svm_high_c.score(X_test, y_test):.2%}")
Practice Questions: SVM Basics
Test your understanding with these coding challenges.
Task: Load the breast cancer dataset, scale features, train a linear SVM with C=1.0, and print accuracy.
Show Solution
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
svm = SVC(kernel='linear', C=1.0, random_state=42)
svm.fit(X_train_scaled, y_train)
print(f"Accuracy: {svm.score(X_test_scaled, y_test):.2%}")
Task: Train SVMs with C values [0.001, 0.1, 1, 10, 100] and compare train/test accuracy.
Show Solution
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
C_values = [0.001, 0.1, 1, 10, 100]
for C in C_values:
svm = SVC(kernel='linear', C=C, random_state=42)
svm.fit(X_train_scaled, y_train)
train_acc = svm.score(X_train_scaled, y_train)
test_acc = svm.score(X_test_scaled, y_test)
print(f"C={C:6.3f}: Train={train_acc:.2%}, Test={test_acc:.2%}")
Task: For C values [0.01, 1, 100], print the number of support vectors. What pattern do you notice?
Show Solution
from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
data.data, data.target, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
for C in [0.01, 1, 100]:
svm = SVC(kernel='linear', C=C, random_state=42)
svm.fit(X_train_scaled, y_train)
n_sv = sum(svm.n_support_)
print(f"C={C:5.2f}: {n_sv} support vectors")
# Lower C -> more support vectors (wider margin includes more points)
Kernel Tricks
What if your data isn't linearly separable? You can't draw a straight line (or hyperplane) to separate the classes. The kernel trick is SVM's secret weapon - it transforms data into a higher-dimensional space where it becomes linearly separable, without actually computing the transformation!
Common Kernel Types
Linear Kernel
No transformation. Works when data is already linearly separable. Fastest to train.
kernel='linear'
RBF Kernel
Radial Basis Function. Most popular. Creates circular decision boundaries. Good default choice.
kernel='rbf'
Polynomial Kernel
Creates polynomial decision boundaries. Degree parameter controls complexity.
kernel='poly'
# Comparing Different Kernels
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
# Generate non-linear data
X, y = make_moons(n_samples=200, noise=0.15, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Test different kernels
kernels = ['linear', 'rbf', 'poly']
for kernel in kernels:
svm = SVC(kernel=kernel, random_state=42)
svm.fit(X_train_scaled, y_train)
acc = svm.score(X_test_scaled, y_test)
print(f"{kernel:8s} kernel: {acc:.2%}")
The Gamma Parameter (RBF Kernel)
For the RBF kernel, the gamma parameter controls how far the influence of a single
training example reaches. Think of it as how "local" vs "global" the decision boundary is:
Low Gamma (e.g., 0.01)
- Each point has wide influence
- Smoother, more global decision boundary
- May underfit (too simple)
- Less sensitive to individual points
High Gamma (e.g., 100)
- Each point has narrow influence
- Wiggly, more local decision boundary
- May overfit (too complex)
- Very sensitive to each point
# Effect of Gamma
from sklearn.svm import SVC
# Low gamma - smooth boundary
svm_low_gamma = SVC(kernel='rbf', gamma=0.01, random_state=42)
svm_low_gamma.fit(X_train_scaled, y_train)
print(f"Low gamma (0.01): {svm_low_gamma.score(X_test_scaled, y_test):.2%}")
# High gamma - wiggly boundary
svm_high_gamma = SVC(kernel='rbf', gamma=100, random_state=42)
svm_high_gamma.fit(X_train_scaled, y_train)
print(f"High gamma (100): {svm_high_gamma.score(X_test_scaled, y_test):.2%}")
# Default gamma='scale' often works well
svm_auto = SVC(kernel='rbf', gamma='scale', random_state=42)
svm_auto.fit(X_train_scaled, y_train)
print(f"Auto gamma: {svm_auto.score(X_test_scaled, y_test):.2%}")
gamma='scale' (default) which uses
1 / (n_features * X.var()). This adapts to your data automatically and is a good
baseline before tuning.
Practice Questions: Kernel Tricks
Test your understanding with these coding challenges.
Task: Load digits dataset, scale features, train SVM with RBF kernel, print accuracy.
Show Solution
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
svm = SVC(kernel='rbf', random_state=42)
svm.fit(X_train_scaled, y_train)
print(f"Accuracy: {svm.score(X_test_scaled, y_test):.2%}")
Task: Use make_circles() to generate data, compare linear, rbf, and poly kernels.
Show Solution
from sklearn.svm import SVC
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X, y = make_circles(n_samples=200, noise=0.1, factor=0.5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
for kernel in ['linear', 'rbf', 'poly']:
svm = SVC(kernel=kernel, random_state=42)
svm.fit(X_train_scaled, y_train)
print(f"{kernel:8s}: {svm.score(X_test_scaled, y_test):.2%}")
# RBF will perform best on circular data!
Task: Use cross_val_score to find the best gamma from [0.001, 0.01, 0.1, 1, 10].
Show Solution
from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
import numpy as np
digits = load_digits()
scaler = StandardScaler()
X_scaled = scaler.fit_transform(digits.data)
gamma_values = [0.001, 0.01, 0.1, 1, 10]
best_gamma, best_score = None, 0
for gamma in gamma_values:
svm = SVC(kernel='rbf', gamma=gamma, random_state=42)
scores = cross_val_score(svm, X_scaled, digits.target, cv=5)
mean_score = scores.mean()
print(f"gamma={gamma:5.3f}: {mean_score:.2%} (+/- {scores.std()*2:.2%})")
if mean_score > best_score:
best_gamma, best_score = gamma, mean_score
print(f"\nBest: gamma={best_gamma} with {best_score:.2%}")
SVM Hyperparameter Tuning
SVM has two main hyperparameters to tune: C (regularization) and gamma (kernel
width for RBF). Finding the right combination is crucial for good performance. Grid Search and Random
Search are the standard approaches, and scikit-learn makes this easy with GridSearchCV.
Grid Search for SVM
Grid Search exhaustively tries all combinations of hyperparameters you specify. For SVM, we typically search over C and gamma values on a logarithmic scale:
# Grid Search for SVM Hyperparameters
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
digits = load_digits()
scaler = StandardScaler()
X_scaled = scaler.fit_transform(digits.data)
# Define parameter grid (logarithmic scale)
param_grid = {
'C': [0.1, 1, 10, 100],
'gamma': [0.001, 0.01, 0.1, 1],
'kernel': ['rbf']
}
# Grid Search with 5-fold cross-validation
svm = SVC(random_state=42)
grid_search = GridSearchCV(svm, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_scaled, digits.target)
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best CV Score: {grid_search.best_score_:.2%}")
Visualizing the Parameter Space
# Visualize Grid Search Results
import pandas as pd
results = pd.DataFrame(grid_search.cv_results_)
pivot = results.pivot(index='param_C', columns='param_gamma', values='mean_test_score')
print("\nAccuracy for each C/gamma combination:")
print(pivot.round(3))
Using the Best Model
# Use the best model for predictions
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X_scaled, digits.target, test_size=0.2, random_state=42
)
# best_estimator_ is already fitted on all training data
best_svm = grid_search.best_estimator_
print(f"Test Accuracy: {best_svm.score(X_test, y_test):.2%}")
# Or create a new model with best params
final_svm = SVC(**grid_search.best_params_, random_state=42)
final_svm.fit(X_train, y_train)
print(f"Final Model Accuracy: {final_svm.score(X_test, y_test):.2%}")
Practice Questions: Hyperparameter Tuning
Test your understanding with these coding challenges.
Task: Use GridSearchCV to find optimal C from [0.1, 1, 10] for linear SVM.
Show Solution
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
data = load_breast_cancer()
scaler = StandardScaler()
X_scaled = scaler.fit_transform(data.data)
param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear']}
grid = GridSearchCV(SVC(random_state=42), param_grid, cv=5)
grid.fit(X_scaled, data.target)
print(f"Best C: {grid.best_params_['C']}")
print(f"Best Score: {grid.best_score_:.2%}")
Task: Include both kernels in grid search. For RBF, also tune gamma.
Show Solution
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
data = load_breast_cancer()
scaler = StandardScaler()
X_scaled = scaler.fit_transform(data.data)
# Different params for different kernels
param_grid = [
{'kernel': ['linear'], 'C': [0.1, 1, 10]},
{'kernel': ['rbf'], 'C': [0.1, 1, 10], 'gamma': [0.01, 0.1, 1]}
]
grid = GridSearchCV(SVC(random_state=42), param_grid, cv=5)
grid.fit(X_scaled, data.target)
print(f"Best Params: {grid.best_params_}")
print(f"Best Score: {grid.best_score_:.2%}")
Task: Use RandomizedSearchCV with loguniform distributions for C and gamma.
Show Solution
from sklearn.svm import SVC
from sklearn.model_selection import RandomizedSearchCV
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
from scipy.stats import loguniform
digits = load_digits()
scaler = StandardScaler()
X_scaled = scaler.fit_transform(digits.data)
param_dist = {
'C': loguniform(0.01, 100),
'gamma': loguniform(0.001, 10),
'kernel': ['rbf']
}
random_search = RandomizedSearchCV(
SVC(random_state=42), param_dist,
n_iter=20, cv=5, random_state=42, n_jobs=-1
)
random_search.fit(X_scaled, digits.target)
print(f"Best Params: {random_search.best_params_}")
print(f"Best Score: {random_search.best_score_:.2%}")
Multi-layer Perceptrons
Multi-layer Perceptrons (MLPs) are the simplest form of neural networks. They consist of layers of
interconnected neurons that learn to recognize patterns through a process called backpropagation.
Scikit-learn provides MLPClassifier for easy neural network training without needing
deep learning frameworks like TensorFlow or PyTorch.
MLP Architecture
Input Layer
One neuron per feature. Receives raw data. No activation function - just passes data through.
Hidden Layers
Where the "learning" happens. Each neuron combines inputs with weights and applies an activation function (ReLU, tanh).
Output Layer
One neuron per class (for classification). Uses softmax to output probabilities that sum to 1.
# Basic MLP Classifier
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target, test_size=0.2, random_state=42
)
# Neural networks need scaled data!
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# MLP with 2 hidden layers (100 and 50 neurons)
mlp = MLPClassifier(
hidden_layer_sizes=(100, 50), # Two hidden layers
activation='relu', # ReLU activation function
max_iter=500, # Maximum training iterations
random_state=42
)
mlp.fit(X_train_scaled, y_train)
print(f"Training Accuracy: {mlp.score(X_train_scaled, y_train):.2%}")
print(f"Test Accuracy: {mlp.score(X_test_scaled, y_test):.2%}")
print(f"Number of iterations: {mlp.n_iter_}")
Key Hyperparameters
hidden_layer_sizes
Tuple defining the number of neurons in each hidden layer.
(100,)- One layer with 100 neurons(100, 50)- Two layers: 100 then 50(100, 100, 100)- Three layers of 100 each
activation
Activation function for hidden layers.
'relu'- Default, fast, works well (ReLU)'tanh'- Outputs between -1 and 1'logistic'- Sigmoid, outputs 0 to 1
# Experimenting with Architectures
architectures = [
(50,), # Shallow: 1 layer, 50 neurons
(100, 50), # Medium: 2 layers
(100, 100, 50), # Deep: 3 layers
]
for arch in architectures:
mlp = MLPClassifier(hidden_layer_sizes=arch, max_iter=500, random_state=42)
mlp.fit(X_train_scaled, y_train)
acc = mlp.score(X_test_scaled, y_test)
print(f"Architecture {str(arch):20s}: {acc:.2%}")
Regularization and Learning Rate
# Preventing Overfitting with Regularization
mlp_reg = MLPClassifier(
hidden_layer_sizes=(100, 50),
alpha=0.01, # L2 regularization (higher = more regularization)
learning_rate='adaptive', # Adjust learning rate during training
learning_rate_init=0.001, # Initial learning rate
early_stopping=True, # Stop when validation score stops improving
validation_fraction=0.1, # Use 10% of training data for validation
n_iter_no_change=10, # Stop after 10 iterations without improvement
max_iter=500,
random_state=42
)
mlp_reg.fit(X_train_scaled, y_train)
print(f"Accuracy: {mlp_reg.score(X_test_scaled, y_test):.2%}")
print(f"Stopped at iteration: {mlp_reg.n_iter_}")
- Not scaling data: MLPs are very sensitive to feature scales
- Not enough iterations: Increase max_iter if you see convergence warnings
- Too many neurons: Start small and increase if underfitting
Practice Questions: Multi-layer Perceptrons
Test your understanding with these coding challenges.
Task: Train MLPClassifier with one hidden layer of 50 neurons on scaled Iris data.
Show Solution
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
iris.data, iris.target, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
mlp = MLPClassifier(hidden_layer_sizes=(50,), max_iter=500, random_state=42)
mlp.fit(X_train_scaled, y_train)
print(f"Accuracy: {mlp.score(X_test_scaled, y_test):.2%}")
Task: Compare 'relu', 'tanh', and 'logistic' activations on the digits dataset.
Show Solution
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
for activation in ['relu', 'tanh', 'logistic']:
mlp = MLPClassifier(
hidden_layer_sizes=(100,), activation=activation,
max_iter=500, random_state=42
)
mlp.fit(X_train_scaled, y_train)
print(f"{activation:10s}: {mlp.score(X_test_scaled, y_test):.2%}")
Task: Train MLP with early_stopping=True and print the loss curve length.
Show Solution
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
mlp = MLPClassifier(
hidden_layer_sizes=(100, 50),
early_stopping=True,
validation_fraction=0.1,
n_iter_no_change=10,
max_iter=1000,
random_state=42
)
mlp.fit(X_train_scaled, y_train)
print(f"Accuracy: {mlp.score(X_test_scaled, y_test):.2%}")
print(f"Iterations: {mlp.n_iter_}")
print(f"Loss curve length: {len(mlp.loss_curve_)}")
print(f"Final loss: {mlp.loss_curve_[-1]:.4f}")
Deep Learning Introduction
Deep learning is a subset of machine learning that uses neural networks with many layers (hence "deep"). While MLPClassifier is great for learning, real-world deep learning uses specialized frameworks like TensorFlow, PyTorch, or Keras that offer GPU acceleration, more layer types, and advanced architectures.
When to Use Deep Learning
Use Deep Learning When
- Large datasets (100,000+ samples)
- Image, audio, or text data
- Complex patterns that simpler models miss
- You have GPU resources
- State-of-the-art accuracy is critical
Use Traditional ML When
- Small to medium datasets
- Tabular/structured data
- Interpretability is important
- Limited compute resources
- Quick prototyping needed
Deep Learning Frameworks
TensorFlow / Keras
Google's framework. Keras provides high-level API. Great for production deployment.
PyTorch
Facebook's framework. Preferred for research. Dynamic computation graphs.
Scikit-learn MLP
Great for learning. Simple API. CPU only. Good for small-medium problems.
Common Neural Network Architectures
CNN (Convolutional Neural Networks)
Specialized for images. Uses filters to detect edges, shapes, and patterns. Powers image classification, object detection, and facial recognition.
RNN/LSTM (Recurrent Neural Networks)
Specialized for sequences. Has memory of previous inputs. Powers language models, translation, and time series forecasting.
Transformer
The architecture behind GPT, BERT, and modern LLMs. Uses attention mechanism. Revolutionary for NLP and increasingly for vision.
GAN (Generative Adversarial Networks)
Two networks compete: generator creates fake data, discriminator detects fakes. Powers image generation, style transfer, and deepfakes.
SVM vs MLP vs Deep Learning
| Aspect | SVM | MLP (sklearn) | Deep Learning |
|---|---|---|---|
| Best for | Small-medium data, clear margins | Learning, quick experiments | Large data, images, text, audio |
| Training speed | Medium (depends on kernel) | Fast (CPU) | Slow (but GPU accelerated) |
| Interpretability | Medium (support vectors) | Low | Very Low (black box) |
| Hyperparameters | C, gamma, kernel | Layers, neurons, learning rate | Many (architecture, optimizers, etc.) |
| Data requirements | Works with small data | Medium data | Needs large data |
Practice Questions: Algorithm Comparison
Test your understanding with these coding challenges.
Task: Train both SVM (RBF) and MLP on digits dataset and compare accuracy.
Show Solution
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# SVM
svm = SVC(kernel='rbf', random_state=42)
svm.fit(X_train_scaled, y_train)
print(f"SVM Accuracy: {svm.score(X_test_scaled, y_test):.2%}")
# MLP
mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=500, random_state=42)
mlp.fit(X_train_scaled, y_train)
print(f"MLP Accuracy: {mlp.score(X_test_scaled, y_test):.2%}")
Task: Time the training of SVM, MLP, and Random Forest on the same data.
Show Solution
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import time
digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
digits.data, digits.target, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
models = {
'SVM': SVC(random_state=42),
'MLP': MLPClassifier(hidden_layer_sizes=(100,), max_iter=500, random_state=42),
'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42)
}
for name, model in models.items():
start = time.time()
model.fit(X_train_scaled, y_train)
elapsed = time.time() - start
acc = model.score(X_test_scaled, y_test)
print(f"{name:15s}: {acc:.2%} (trained in {elapsed:.3f}s)")
Task: Create a Pipeline with StandardScaler and MLP, then use GridSearchCV to tune hidden_layer_sizes.
Show Solution
from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_digits
digits = load_digits()
# Create pipeline
pipe = Pipeline([
('scaler', StandardScaler()),
('mlp', MLPClassifier(max_iter=500, random_state=42))
])
# Grid search over architectures
param_grid = {
'mlp__hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 50)]
}
grid = GridSearchCV(pipe, param_grid, cv=3, n_jobs=-1)
grid.fit(digits.data, digits.target)
print(f"Best architecture: {grid.best_params_}")
print(f"Best score: {grid.best_score_:.2%}")