SVM & Neural Networks

Support Vector Machines

Support Vector Machines (SVMs) are powerful classifiers that find the optimal hyperplane to separate classes. Unlike other classifiers that just find any separating boundary, SVMs find the boundary with the maximum margin - the largest possible distance between the boundary and the nearest data points from each class. This makes SVMs robust and great at generalizing to new data.

Road Analogy: Imagine drawing a road between two towns (classes). Other algorithms might draw any road that separates them. SVM draws the widest possible road - maximizing the "buffer zone" on each side. The wider the road, the less likely a new house will be misclassified as belonging to the wrong town.

What is a Hyperplane?

A hyperplane is a decision boundary that separates different classes. In 2D, it's a line. In 3D, it's a plane. In higher dimensions, we call it a hyperplane. SVM finds the hyperplane that maximizes the margin between classes.

Support Vectors

The data points closest to the decision boundary. These "support" the hyperplane - if you remove other points, the boundary stays the same. Only support vectors matter for the final model.

Margin

The distance between the hyperplane and the nearest support vectors. SVM maximizes this margin, creating a "safety buffer" that improves generalization to new data.

Hard Margin vs Soft Margin

In a perfect world, data is linearly separable and we can find a hyperplane with no misclassifications. This is called hard margin SVM. But real data is messy - some points may be on the wrong side. Soft margin SVM allows some misclassifications controlled by the C parameter.

# Basic SVM Classification
from sklearn.svm import SVC
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Load and prepare data
iris = load_iris()
X, y = iris.data[:, :2], iris.target  # Use first 2 features for visualization

# SVM requires feature scaling!
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42
)

# Train SVM with linear kernel
svm_clf = SVC(kernel='linear', C=1.0, random_state=42)
svm_clf.fit(X_train, y_train)

print(f"Training Accuracy: {svm_clf.score(X_train, y_train):.2%}")
print(f"Test Accuracy: {svm_clf.score(X_test, y_test):.2%}")
print(f"Number of Support Vectors: {svm_clf.n_support_}")

Critical: Always Scale Your Features! SVM is sensitive to feature scales because it measures distances. A feature ranging from 0-1000 will dominate one ranging from 0-1. Always use StandardScaler or MinMaxScaler before training SVM.

The C Parameter

The C parameter controls the trade-off between having a smooth decision boundary and classifying training points correctly. Think of C as "how much you care about mistakes":

Low C (e.g., 0.01)

Wider margin (more tolerance for errors)
More regularization
Simpler decision boundary
May underfit if too low
Better for noisy data

High C (e.g., 100)

Narrower margin (less tolerance)
Less regularization
Complex decision boundary
May overfit if too high
Better for clean data

# Effect of C parameter
from sklearn.svm import SVC

# Low C - wider margin, more misclassifications allowed
svm_low_c = SVC(kernel='linear', C=0.01, random_state=42)
svm_low_c.fit(X_train, y_train)
print(f"Low C (0.01): Train={svm_low_c.score(X_train, y_train):.2%}, Test={svm_low_c.score(X_test, y_test):.2%}")

# High C - narrow margin, fewer misclassifications
svm_high_c = SVC(kernel='linear', C=100, random_state=42)
svm_high_c.fit(X_train, y_train)
print(f"High C (100): Train={svm_high_c.score(X_train, y_train):.2%}, Test={svm_high_c.score(X_test, y_test):.2%}")

Practice Questions: SVM Basics

Test your understanding with these coding challenges.

Task: Load the breast cancer dataset, scale features, train a linear SVM with C=1.0, and print accuracy.

Show Solution

from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

svm = SVC(kernel='linear', C=1.0, random_state=42)
svm.fit(X_train_scaled, y_train)

print(f"Accuracy: {svm.score(X_test_scaled, y_test):.2%}")

Task: Train SVMs with C values [0.001, 0.1, 1, 10, 100] and compare train/test accuracy.

Show Solution

from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

C_values = [0.001, 0.1, 1, 10, 100]
for C in C_values:
    svm = SVC(kernel='linear', C=C, random_state=42)
    svm.fit(X_train_scaled, y_train)
    train_acc = svm.score(X_train_scaled, y_train)
    test_acc = svm.score(X_test_scaled, y_test)
    print(f"C={C:6.3f}: Train={train_acc:.2%}, Test={test_acc:.2%}")

Task: For C values [0.01, 1, 100], print the number of support vectors. What pattern do you notice?

Show Solution

from sklearn.svm import SVC
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

data = load_breast_cancer()
X_train, X_test, y_train, y_test = train_test_split(
    data.data, data.target, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)

for C in [0.01, 1, 100]:
    svm = SVC(kernel='linear', C=C, random_state=42)
    svm.fit(X_train_scaled, y_train)
    n_sv = sum(svm.n_support_)
    print(f"C={C:5.2f}: {n_sv} support vectors")
# Lower C -> more support vectors (wider margin includes more points)

Kernel Tricks

What if your data isn't linearly separable? You can't draw a straight line (or hyperplane) to separate the classes. The kernel trick is SVM's secret weapon - it transforms data into a higher-dimensional space where it becomes linearly separable, without actually computing the transformation!

The Magic of Kernels: Imagine you have red and blue dots in a circle pattern - red in the center, blue around it. No straight line can separate them in 2D. But if you "lift" the center dots up (add a third dimension based on distance from center), suddenly a flat plane can separate them! Kernels do this mathematically without computing the actual transformation.

Common Kernel Types

Linear Kernel

No transformation. Works when data is already linearly separable. Fastest to train.

kernel='linear'

RBF Kernel

Radial Basis Function. Most popular. Creates circular decision boundaries. Good default choice.

kernel='rbf'

Polynomial Kernel

Creates polynomial decision boundaries. Degree parameter controls complexity.

kernel='poly'

# Comparing Different Kernels
from sklearn.svm import SVC
from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Generate non-linear data
X, y = make_moons(n_samples=200, noise=0.15, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Test different kernels
kernels = ['linear', 'rbf', 'poly']
for kernel in kernels:
    svm = SVC(kernel=kernel, random_state=42)
    svm.fit(X_train_scaled, y_train)
    acc = svm.score(X_test_scaled, y_test)
    print(f"{kernel:8s} kernel: {acc:.2%}")

The Gamma Parameter (RBF Kernel)

For the RBF kernel, the gamma parameter controls how far the influence of a single training example reaches. Think of it as how "local" vs "global" the decision boundary is:

Low Gamma (e.g., 0.01)

Each point has wide influence
Smoother, more global decision boundary
May underfit (too simple)
Less sensitive to individual points

High Gamma (e.g., 100)

Each point has narrow influence
Wiggly, more local decision boundary
May overfit (too complex)
Very sensitive to each point

# Effect of Gamma
from sklearn.svm import SVC

# Low gamma - smooth boundary
svm_low_gamma = SVC(kernel='rbf', gamma=0.01, random_state=42)
svm_low_gamma.fit(X_train_scaled, y_train)
print(f"Low gamma (0.01): {svm_low_gamma.score(X_test_scaled, y_test):.2%}")

# High gamma - wiggly boundary  
svm_high_gamma = SVC(kernel='rbf', gamma=100, random_state=42)
svm_high_gamma.fit(X_train_scaled, y_train)
print(f"High gamma (100): {svm_high_gamma.score(X_test_scaled, y_test):.2%}")

# Default gamma='scale' often works well
svm_auto = SVC(kernel='rbf', gamma='scale', random_state=42)
svm_auto.fit(X_train_scaled, y_train)
print(f"Auto gamma: {svm_auto.score(X_test_scaled, y_test):.2%}")

Pro Tip: Start with gamma='scale' (default) which uses 1 / (n_features * X.var()). This adapts to your data automatically and is a good baseline before tuning.

Practice Questions: Kernel Tricks

Test your understanding with these coding challenges.

Task: Load digits dataset, scale features, train SVM with RBF kernel, print accuracy.

Show Solution

from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

svm = SVC(kernel='rbf', random_state=42)
svm.fit(X_train_scaled, y_train)

print(f"Accuracy: {svm.score(X_test_scaled, y_test):.2%}")

Task: Use make_circles() to generate data, compare linear, rbf, and poly kernels.

Show Solution

from sklearn.svm import SVC
from sklearn.datasets import make_circles
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

X, y = make_circles(n_samples=200, noise=0.1, factor=0.5, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

for kernel in ['linear', 'rbf', 'poly']:
    svm = SVC(kernel=kernel, random_state=42)
    svm.fit(X_train_scaled, y_train)
    print(f"{kernel:8s}: {svm.score(X_test_scaled, y_test):.2%}")
# RBF will perform best on circular data!

Task: Use cross_val_score to find the best gamma from [0.001, 0.01, 0.1, 1, 10].

Show Solution

from sklearn.svm import SVC
from sklearn.datasets import load_digits
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler
import numpy as np

digits = load_digits()
scaler = StandardScaler()
X_scaled = scaler.fit_transform(digits.data)

gamma_values = [0.001, 0.01, 0.1, 1, 10]
best_gamma, best_score = None, 0

for gamma in gamma_values:
    svm = SVC(kernel='rbf', gamma=gamma, random_state=42)
    scores = cross_val_score(svm, X_scaled, digits.target, cv=5)
    mean_score = scores.mean()
    print(f"gamma={gamma:5.3f}: {mean_score:.2%} (+/- {scores.std()*2:.2%})")
    if mean_score > best_score:
        best_gamma, best_score = gamma, mean_score

print(f"\nBest: gamma={best_gamma} with {best_score:.2%}")

SVM Hyperparameter Tuning

SVM has two main hyperparameters to tune: C (regularization) and gamma (kernel width for RBF). Finding the right combination is crucial for good performance. Grid Search and Random Search are the standard approaches, and scikit-learn makes this easy with GridSearchCV.

Grid Search for SVM

Grid Search exhaustively tries all combinations of hyperparameters you specify. For SVM, we typically search over C and gamma values on a logarithmic scale:

# Grid Search for SVM Hyperparameters
from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler

digits = load_digits()
scaler = StandardScaler()
X_scaled = scaler.fit_transform(digits.data)

# Define parameter grid (logarithmic scale)
param_grid = {
    'C': [0.1, 1, 10, 100],
    'gamma': [0.001, 0.01, 0.1, 1],
    'kernel': ['rbf']
}

# Grid Search with 5-fold cross-validation
svm = SVC(random_state=42)
grid_search = GridSearchCV(svm, param_grid, cv=5, scoring='accuracy', n_jobs=-1)
grid_search.fit(X_scaled, digits.target)

print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best CV Score: {grid_search.best_score_:.2%}")

Visualizing the Parameter Space

# Visualize Grid Search Results
import pandas as pd

results = pd.DataFrame(grid_search.cv_results_)
pivot = results.pivot(index='param_C', columns='param_gamma', values='mean_test_score')
print("\nAccuracy for each C/gamma combination:")
print(pivot.round(3))

Tuning Strategy: Start with a coarse grid (powers of 10: 0.01, 0.1, 1, 10, 100) to find the right region, then zoom in with a finer grid around the best values.

Using the Best Model

# Use the best model for predictions
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, digits.target, test_size=0.2, random_state=42
)

# best_estimator_ is already fitted on all training data
best_svm = grid_search.best_estimator_
print(f"Test Accuracy: {best_svm.score(X_test, y_test):.2%}")

# Or create a new model with best params
final_svm = SVC(**grid_search.best_params_, random_state=42)
final_svm.fit(X_train, y_train)
print(f"Final Model Accuracy: {final_svm.score(X_test, y_test):.2%}")

Practice Questions: Hyperparameter Tuning

Test your understanding with these coding challenges.

Task: Use GridSearchCV to find optimal C from [0.1, 1, 10] for linear SVM.

Show Solution

from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler

data = load_breast_cancer()
scaler = StandardScaler()
X_scaled = scaler.fit_transform(data.data)

param_grid = {'C': [0.1, 1, 10], 'kernel': ['linear']}
grid = GridSearchCV(SVC(random_state=42), param_grid, cv=5)
grid.fit(X_scaled, data.target)

print(f"Best C: {grid.best_params_['C']}")
print(f"Best Score: {grid.best_score_:.2%}")

Task: Include both kernels in grid search. For RBF, also tune gamma.

Show Solution

from sklearn.svm import SVC
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler

data = load_breast_cancer()
scaler = StandardScaler()
X_scaled = scaler.fit_transform(data.data)

# Different params for different kernels
param_grid = [
    {'kernel': ['linear'], 'C': [0.1, 1, 10]},
    {'kernel': ['rbf'], 'C': [0.1, 1, 10], 'gamma': [0.01, 0.1, 1]}
]

grid = GridSearchCV(SVC(random_state=42), param_grid, cv=5)
grid.fit(X_scaled, data.target)

print(f"Best Params: {grid.best_params_}")
print(f"Best Score: {grid.best_score_:.2%}")

Task: Use RandomizedSearchCV with loguniform distributions for C and gamma.

Show Solution

from sklearn.svm import SVC
from sklearn.model_selection import RandomizedSearchCV
from sklearn.datasets import load_digits
from sklearn.preprocessing import StandardScaler
from scipy.stats import loguniform

digits = load_digits()
scaler = StandardScaler()
X_scaled = scaler.fit_transform(digits.data)

param_dist = {
    'C': loguniform(0.01, 100),
    'gamma': loguniform(0.001, 10),
    'kernel': ['rbf']
}

random_search = RandomizedSearchCV(
    SVC(random_state=42), param_dist, 
    n_iter=20, cv=5, random_state=42, n_jobs=-1
)
random_search.fit(X_scaled, digits.target)

print(f"Best Params: {random_search.best_params_}")
print(f"Best Score: {random_search.best_score_:.2%}")

Multi-layer Perceptrons

Multi-layer Perceptrons (MLPs) are the simplest form of neural networks. They consist of layers of interconnected neurons that learn to recognize patterns through a process called backpropagation. Scikit-learn provides MLPClassifier for easy neural network training without needing deep learning frameworks like TensorFlow or PyTorch.

Brain Analogy: Think of an MLP like a simplified brain. Input neurons receive data (like your eyes seeing an image). Hidden layers process and combine this information (like your visual cortex recognizing shapes). Output neurons give the final answer (like your brain saying "that's a cat"). During training, the network adjusts its connections to get better at the task.

MLP Architecture

Input Layer

One neuron per feature. Receives raw data. No activation function - just passes data through.

Hidden Layers

Where the "learning" happens. Each neuron combines inputs with weights and applies an activation function (ReLU, tanh).

Output Layer

One neuron per class (for classification). Uses softmax to output probabilities that sum to 1.

# Basic MLP Classifier
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.2, random_state=42
)

# Neural networks need scaled data!
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# MLP with 2 hidden layers (100 and 50 neurons)
mlp = MLPClassifier(
    hidden_layer_sizes=(100, 50),  # Two hidden layers
    activation='relu',              # ReLU activation function
    max_iter=500,                   # Maximum training iterations
    random_state=42
)
mlp.fit(X_train_scaled, y_train)

print(f"Training Accuracy: {mlp.score(X_train_scaled, y_train):.2%}")
print(f"Test Accuracy: {mlp.score(X_test_scaled, y_test):.2%}")
print(f"Number of iterations: {mlp.n_iter_}")

Key Hyperparameters

hidden_layer_sizes

Tuple defining the number of neurons in each hidden layer.

(100,) - One layer with 100 neurons
(100, 50) - Two layers: 100 then 50
(100, 100, 100) - Three layers of 100 each

activation

Activation function for hidden layers.

'relu' - Default, fast, works well (ReLU)
'tanh' - Outputs between -1 and 1
'logistic' - Sigmoid, outputs 0 to 1

# Experimenting with Architectures
architectures = [
    (50,),           # Shallow: 1 layer, 50 neurons
    (100, 50),       # Medium: 2 layers
    (100, 100, 50),  # Deep: 3 layers
]

for arch in architectures:
    mlp = MLPClassifier(hidden_layer_sizes=arch, max_iter=500, random_state=42)
    mlp.fit(X_train_scaled, y_train)
    acc = mlp.score(X_test_scaled, y_test)
    print(f"Architecture {str(arch):20s}: {acc:.2%}")

Regularization and Learning Rate

# Preventing Overfitting with Regularization
mlp_reg = MLPClassifier(
    hidden_layer_sizes=(100, 50),
    alpha=0.01,              # L2 regularization (higher = more regularization)
    learning_rate='adaptive', # Adjust learning rate during training
    learning_rate_init=0.001, # Initial learning rate
    early_stopping=True,      # Stop when validation score stops improving
    validation_fraction=0.1,  # Use 10% of training data for validation
    n_iter_no_change=10,      # Stop after 10 iterations without improvement
    max_iter=500,
    random_state=42
)
mlp_reg.fit(X_train_scaled, y_train)

print(f"Accuracy: {mlp_reg.score(X_test_scaled, y_test):.2%}")
print(f"Stopped at iteration: {mlp_reg.n_iter_}")

Common Pitfalls:

Not scaling data: MLPs are very sensitive to feature scales
Not enough iterations: Increase max_iter if you see convergence warnings
Too many neurons: Start small and increase if underfitting

Practice Questions: Multi-layer Perceptrons

Test your understanding with these coding challenges.

Task: Train MLPClassifier with one hidden layer of 50 neurons on scaled Iris data.

Show Solution

from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

mlp = MLPClassifier(hidden_layer_sizes=(50,), max_iter=500, random_state=42)
mlp.fit(X_train_scaled, y_train)

print(f"Accuracy: {mlp.score(X_test_scaled, y_test):.2%}")

Task: Compare 'relu', 'tanh', and 'logistic' activations on the digits dataset.

Show Solution

from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

for activation in ['relu', 'tanh', 'logistic']:
    mlp = MLPClassifier(
        hidden_layer_sizes=(100,), activation=activation,
        max_iter=500, random_state=42
    )
    mlp.fit(X_train_scaled, y_train)
    print(f"{activation:10s}: {mlp.score(X_test_scaled, y_test):.2%}")

Task: Train MLP with early_stopping=True and print the loss curve length.

Show Solution

from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

mlp = MLPClassifier(
    hidden_layer_sizes=(100, 50),
    early_stopping=True,
    validation_fraction=0.1,
    n_iter_no_change=10,
    max_iter=1000,
    random_state=42
)
mlp.fit(X_train_scaled, y_train)

print(f"Accuracy: {mlp.score(X_test_scaled, y_test):.2%}")
print(f"Iterations: {mlp.n_iter_}")
print(f"Loss curve length: {len(mlp.loss_curve_)}")
print(f"Final loss: {mlp.loss_curve_[-1]:.4f}")

Deep Learning Introduction

Deep learning is a subset of machine learning that uses neural networks with many layers (hence "deep"). While MLPClassifier is great for learning, real-world deep learning uses specialized frameworks like TensorFlow, PyTorch, or Keras that offer GPU acceleration, more layer types, and advanced architectures.

When to Use Deep Learning

Use Deep Learning When

Large datasets (100,000+ samples)
Image, audio, or text data
Complex patterns that simpler models miss
You have GPU resources
State-of-the-art accuracy is critical

Use Traditional ML When

Small to medium datasets
Tabular/structured data
Interpretability is important
Limited compute resources
Quick prototyping needed

Deep Learning Frameworks

TensorFlow / Keras

Google's framework. Keras provides high-level API. Great for production deployment.

PyTorch

Facebook's framework. Preferred for research. Dynamic computation graphs.

Scikit-learn MLP

Great for learning. Simple API. CPU only. Good for small-medium problems.

Common Neural Network Architectures

CNN (Convolutional Neural Networks)

Specialized for images. Uses filters to detect edges, shapes, and patterns. Powers image classification, object detection, and facial recognition.

RNN/LSTM (Recurrent Neural Networks)

Specialized for sequences. Has memory of previous inputs. Powers language models, translation, and time series forecasting.

Transformer

The architecture behind GPT, BERT, and modern LLMs. Uses attention mechanism. Revolutionary for NLP and increasingly for vision.

GAN (Generative Adversarial Networks)

Two networks compete: generator creates fake data, discriminator detects fakes. Powers image generation, style transfer, and deepfakes.

SVM vs MLP vs Deep Learning

Aspect	SVM	MLP (sklearn)	Deep Learning
Best for	Small-medium data, clear margins	Learning, quick experiments	Large data, images, text, audio
Training speed	Medium (depends on kernel)	Fast (CPU)	Slow (but GPU accelerated)
Interpretability	Medium (support vectors)	Low	Very Low (black box)
Hyperparameters	C, gamma, kernel	Layers, neurons, learning rate	Many (architecture, optimizers, etc.)
Data requirements	Works with small data	Medium data	Needs large data

Practical Advice: For tabular data and structured problems, start with traditional ML (Random Forest, XGBoost, SVM). They're faster to train, easier to interpret, and often perform just as well. Save deep learning for images, text, and truly large datasets where it shines.

Practice Questions: Algorithm Comparison

Test your understanding with these coding challenges.

Task: Train both SVM (RBF) and MLP on digits dataset and compare accuracy.

Show Solution

from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# SVM
svm = SVC(kernel='rbf', random_state=42)
svm.fit(X_train_scaled, y_train)
print(f"SVM Accuracy: {svm.score(X_test_scaled, y_test):.2%}")

# MLP
mlp = MLPClassifier(hidden_layer_sizes=(100,), max_iter=500, random_state=42)
mlp.fit(X_train_scaled, y_train)
print(f"MLP Accuracy: {mlp.score(X_test_scaled, y_test):.2%}")

Task: Time the training of SVM, MLP, and Random Forest on the same data.

Show Solution

from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import time

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(
    digits.data, digits.target, test_size=0.2, random_state=42
)

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

models = {
    'SVM': SVC(random_state=42),
    'MLP': MLPClassifier(hidden_layer_sizes=(100,), max_iter=500, random_state=42),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42)
}

for name, model in models.items():
    start = time.time()
    model.fit(X_train_scaled, y_train)
    elapsed = time.time() - start
    acc = model.score(X_test_scaled, y_test)
    print(f"{name:15s}: {acc:.2%} (trained in {elapsed:.3f}s)")

Task: Create a Pipeline with StandardScaler and MLP, then use GridSearchCV to tune hidden_layer_sizes.

Show Solution

from sklearn.neural_network import MLPClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.datasets import load_digits

digits = load_digits()

# Create pipeline
pipe = Pipeline([
    ('scaler', StandardScaler()),
    ('mlp', MLPClassifier(max_iter=500, random_state=42))
])

# Grid search over architectures
param_grid = {
    'mlp__hidden_layer_sizes': [(50,), (100,), (50, 50), (100, 50)]
}

grid = GridSearchCV(pipe, param_grid, cv=3, n_jobs=-1)
grid.fit(digits.data, digits.target)

print(f"Best architecture: {grid.best_params_}")
print(f"Best score: {grid.best_score_:.2%}")

What You'll Learn

Contents

Support Vector Machines

What is a Hyperplane?

Support Vectors

Margin

Hard Margin vs Soft Margin

The C Parameter

Low C (e.g., 0.01)

High C (e.g., 100)

Practice Questions: SVM Basics

Easy Train a linear SVM on breast cancer dataset

Medium Compare different C values

Hard Count support vectors for different C values

Kernel Tricks

Common Kernel Types

Linear Kernel

RBF Kernel

Polynomial Kernel

The Gamma Parameter (RBF Kernel)

Low Gamma (e.g., 0.01)

High Gamma (e.g., 100)

Practice Questions: Kernel Tricks

Easy Train an RBF SVM on the digits dataset

Medium Compare all three kernels on make_circles data

Hard Find optimal gamma using cross-validation

SVM Hyperparameter Tuning

Grid Search for SVM

Visualizing the Parameter Space

Using the Best Model

Practice Questions: Hyperparameter Tuning

Easy Perform basic grid search on breast cancer data

Medium Compare linear vs RBF with grid search

Hard Use RandomizedSearchCV for faster tuning

Multi-layer Perceptrons

MLP Architecture

Input Layer

Hidden Layers

Output Layer

Key Hyperparameters

hidden_layer_sizes

activation

Regularization and Learning Rate

Practice Questions: Multi-layer Perceptrons

Easy Train a simple MLP on Iris dataset

Medium Compare different activation functions

Hard Use early stopping and plot the loss curve

Deep Learning Introduction

When to Use Deep Learning

Use Deep Learning When

Use Traditional ML When

Deep Learning Frameworks

TensorFlow / Keras

PyTorch

Scikit-learn MLP

Common Neural Network Architectures

CNN (Convolutional Neural Networks)

RNN/LSTM (Recurrent Neural Networks)

Transformer

GAN (Generative Adversarial Networks)

SVM vs MLP vs Deep Learning

Practice Questions: Algorithm Comparison

Easy Compare SVM vs MLP on the same dataset

Medium Compare training times of different models

Hard Build a complete classification pipeline

Key Takeaways

SVM Margin Maximization

Kernel Trick

C and Gamma Tuning

Neural Networks

Scale Your Data

Deep Learning When Needed

Knowledge Check

Decision Trees & Ensemble Methods

Clustering Algorithms